MACCS分子指纹166bit版各点位的信息

发布于:2022-10-15 ⋅ 阅读:(566) ⋅ 点赞:(0)

 1.What is MACCS keys? 

The MACCS (Molecular ACCess System) keys [1,2] are one of the most commonly used structural keys. They are sometimes referred to as the MDL keys, named after the company that developed them [the MDL Information Systems (now BIOVIA)]. While there are two sets of MACCS keys (one with 960 keys and the other containing a subset of 166 keys), only the shorter fragment definitions are available to the public. These 166 public keys are implemented in popular open-source cheminformatics software packages, including RDKit [3], OpenBabel [4,5], CDK [6,7], etc.

2.What is the exact meaning of each bit? 

The fragment definitions for the MACCS 166 keys can be found in this document:

rdkit/MACCSkeys.py at master · rdkit/rdkit · GitHub

Additionally,MACCS is developed based on SMART which is a language for describing molecular patterns, owing to this, the official document is adhered below in which the meaning of every symbol in SMARTS sequence is  detailed.

Daylight Theory: SMARTS - A Language for Describing Molecular Patterns

The exact definition of  the MACCS 166 keys is displayed in the chart below:

To understand the remark better, it is recomemded that following extra note the reader should acknowlege:

        1.Q in chart means non-carbon group or atom

        2.A  means any group or atom

        3.Some chemical sub-structure graphs are provided as example to explain the SAMRTS code more vividly.

S.N SMARTS CODE REMARK CHINESE REMARK
1 ('?', 0) ISOTOPE  有无同位素
2 ('[#104]', 0) limit the above def'n since the RDKit only accepts up to #104 目前RDKIT元素类型只更新到103号
3 ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0) IVa,Va,VIa Rows 4-6

4-6周期 14-16族元素,包含锗、砷、硒、锡、锑(tī)、碲、

铊、铅、铋

4 ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0) Actinide 锕系元素
5 ('[Sc,Ti,Y,Zr,Hf]', 0) Group IIIB,IVB 4-6 周期 鈧族 鈦族元素 
6 ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0) Lanthanide 镧系元素
7 ('[V,Cr,Mn,Nb,Mo,Tc,Ta,W,Re]', 0) Group VB,VIB,VIIB 4-6周期 钒族、铬族、锰族元素
8 ('[!#6;!#1]1~*~*~*~1', 0) QAAA@1 非碳元非氢元4元环,任意键
9 ('[Fe,Co,Ni,Ru,Rh,Pd,Os,Ir,Pt]', 0)  Group VIII 4-6周期,铁族、钴族、镍族元素
10 ('[Be,Mg,Ca,Sr,Ba,Ra]', 0) Group IIa (Alkaline earth) 碱土
11 ('*1~*~*~*~1', 0) 4M Ring 通配4元环,任意键,任意元素
12 ('[Cu,Zn,Ag,Cd,Au,Hg]', 0) Group IB,IIB 4-6周期,铜族、锌族元素
13 ('[#8]~[#7](~[#6])~[#6]', 0) ON(C)C
14 ('[#16]-[#16]', 0) S-S 硫单键
15 ('[#8]~[#6](~[#8])~[#8]', 0) OC(O)O
16 ('[!#6;!#1]1~*~*~1', 0) QAA@1 非碳基 任意2基团 组成的逆时针三元环
17 ('[#6]#[#6]', 0) CTC 双碳三键
18 ('[#5,#13,#31,#49,#81]', 0) Group IIIA 除第7周期外的硼族元素
19 ('*1~*~*~*~*~*~*~1', 0) 7M Ring 通配7元环,任意键,任意元素
20 ('[#14]', 0) Si 硅元素
21 ('[#6]=[#6](~[!#6;!#1])~[!#6;!#1]', 0) C=C(Q)Q 碳双键链接非碳非氢元素
22 ('*1~*~*~1', 0) 3M Ring 3元环,任意元素,任意键
23 ('[#7]~[#6](~[#8])~[#8]', 0) NC(O)O
24 ('[#7]-[#8]', 0) N-O 氮、氧单键
25 ('[#7]~[#6](~[#7])~[#7]', 0) NC(N)N
26 ('[#6]=;@[#6](@*)@*', 0) C$=C($A)$A
27 ('[I]', 0) I 碘元素
28 ('[!#6;!#1]~[CH2]~[!#6;!#1]', 0) QCH2Q 亚甲基与非碳连接
29 ('[#15]', 0) P 磷元素
30 ('[#6]~[!#6;!#1](~[#6])(~[#6])~*', 0) CQ(C)(C)A
31 ('[!#6;!#1]~[F,Cl,Br,I]', 0) QX 非碳基接卤素
32 ('[#6]~[#16]~[#7]', 0) CSN 碳硫氮任意键连接
33 ('[#7]~[#16]', 0) NS 氮硫任意键连接
34 ('[CH2]=*', 0), CH2=A 亚甲基双键连接任意
35 ('[Li,Na,K,Rb,Cs,Fr]', 0) Group IA (Alkali Metal) 碱金属族
36 ('[#16R]', 0) S Heterocycle 硫环
37 ('[#7]~[#6](~[#8])~[#7]', 0) NC(O)N
38 ('[#7]~[#6](~[#6])~[#7]', 0) NC(C)N
39 ('[#8]~[#16](~[#8])~[#8]', 0) OS(O)O
40 ('[#16]-[#8]', 0) S-O 硫氧单键
41 ('[#6]#[#7]', 0) CTN 碳氮三键
42 ('F', 0) F 氟元素
43 ('[!#6;!#1;!H0]~*~[!#6;!#1;!H0]', 0) QHAQH
44 ('[!#1;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#35;!#53]', 0) other 有元素非碳、非氢、非氮、非氧、非硅、非磷、非卤素
45 ('[#6]=[#6]~[#7]', 0) C=CN
46 ('Br', 0),  Br 溴元素
47 ('[#16]~*~[#7]', 0) SAN 硫+任意+氮
48 ('[#8]~[!#6;!#1](~[#8])(~[#8])', 0) OQ(O)O
49 ('[!+0]', 0) charge 电子
50 ('[#6]=[#6](~[#6])~[#6]', 0) C=C(C)C
51 ('[#6]~[#16]~[#8]', 0) CSO 碳硫氧任意键连接
52 ('[#7]~[#7]', 0) NN 氮氮任意连接
53 ('[!#6;!#1;!H0]~*~*~*~[!#6;!#1;!H0]', 0) QHAAAQH 非碳基团接任意3元素接非碳基团
54 ('[!#6;!#1;!H0]~*~*~[!#6;!#1;!H0]', 0) QHAAQH 非碳基团接任意2元素接非碳基团
55 ('[#8]~[#16]~[#8]', 0) OSO 氧硫氧任意键
57 ('[#8R]', 0) O Heterocycle 氧环
58 ('[!#6;!#1]~[#16]~[!#6;!#1]', 0) QSQ 非碳基接硫接非碳基
59 ('[#16]!:*:*', 0) Snot%A%A %代表芳香键
60 ('[#16]=[#8]', 0) S=O 硫氧双键
61 ('*~[#16](~*)~*', 0) AS(A)A
62 ('*@*!@*@*', 0) A$!A$A
63 ('[#7]=[#8]', 0) N=O
64 ('*@*!@[#16]', 0) A$A!S
65 ('c:n', 0) C%N
66 ('[#6]~[#6](~[#6])(~[#6])~*', 0) CC(C)(C)A
67 ('[!#6;!#1]~[#16]', 0) QS
68 ('[!#6;!#1;!H0]~[!#6;!#1;!H0]', 0) QHQH
69 ('[!#6;!#1]~[!#6;!#1;!H0]', 0) QH
70 ('[!#6;!#1]~[#7]~[!#6;!#1]', 0) QNQ
71 ('[#7]~[#8]', 0) NO
72 ('[#8]~*~*~[#8]', 0) OAAO
73 ('[#16]=*', 0) S=A S双键连接任意原子
74 ('[CH3]~*~[CH3]', 0) CH3ACH3
75 ('*!@[#7]@*', 0) A!N$A
76 ('[#6]=[#6](~*)~*', 0) C=C(A)A
77 ('[#7]~*~[#7]', 0) NAN
78 ('[#6]=[#7]', 0) C=N
79 ('[#7]~*~*~[#7]', 0) NAAN
80 ('[#7]~*~*~*~[#7]', 0) NAAAN
81 ('[#16]~*(~*)~*', 0) SA(A)A
82 ('*~[CH2]~[!#6;!#1;!H0]', 0) ACH2QH
83 ('[!#6;!#1]1~*~*~*~*~1', 0) QAAAA@1
84 ('[NH2]', 0) NH2 氨基
85 ('[#6]~[#7](~[#6])~[#6]', 0) CN(C)C
86 ('[C;H2,H3][!#6;!#1][C;H2,H3]', 0) CH2QCH2
87 ('[F,Cl,Br,I]!@*@*', 0) X!A$A X代表卤素
88 ('[#16]', 0) S 硫原子
89 ('[#8]~*~*~*~[#8]', 0) OAAAO
90

('[$([!#6;!#1;!H0]~*~*~[CH2]~*),

$([!#6;!#1;!H0;R]1@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~[R]1@[R]@[CH2;R]1)]',0)

QHAACH2A
91

('[$([!#6;!#1;!H0]~*~*~*~[CH2]~*),

$([!#6;!#1;!H0;R]1@[R]@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~[R]1@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~*~[R]1@[R]@[CH2;R]1)]',0)

QHAAACH2A
92 ('[#8]~[#6](~[#7])~[#6]', 0) OC(N)C
93 ('[!#6;!#1]~[CH3]', 0) QCH3
94 ('[!#6;!#1]~[#7]', 0) QN
95 ('[#7]~*~*~[#8]', 0) NAAO
96 ('*1~*~*~*~*~1', 0) 5 M ring 5元环任意键任意原子
97 ('[#7]~*~*~*~[#8]', 0) NAAAO
98 ('[!#6;!#1]1~*~*~*~*~*~1', 0) QAAAAA@1 5元环接杂基团
99 ('[#6]=[#6]', 0) C=C
100 ('*~[CH2]~[#7]', 0) ACH2N
101

('[$([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1)]',0)

8M Ring or larger. This only handles up to ring sizes of 14 8元环或以上,最大14
102 ('[!#6;!#1]~[#8]', 0) QO
103 ('Cl', 0) CL 氯原子
104 ('[!#6;!#1;!H0]~*~[CH2]~*', 0) QHACH2A
105 ('*@*(@*)@*', 0) A$A($A)$A
106 ('[!#6;!#1]~*(~[!#6;!#1])~[!#6;!#1]', 0) QA(Q)Q
107 ('[F,Cl,Br,I]~*(~*)~*', 0) XA(A)A
108 ('[CH3]~*~*~*~[CH2]~*', 0) CH3AAACH2A
109 ('*~[CH2]~[#8]', 0) ACH2O
110 ('[#7]~[#6]~[#8]', 0) NCO
111 ('[#7]~*~[CH2]~*', 0) NACH2A
112 ('*~*(~*)(~*)~*', 0) AA(A)(A)A
113 ('[#8]!:*:*', 0) Onot%A%A
114 ('[CH3]~[CH2]~*', 0) CH3CH2A
115 ('[CH3]~*~[CH2]~*', 0) CH3ACH2A
116 ('[$([CH3]~*~*~[CH2]~*),$([CH3]~*1~*~[CH2]1)]', 0) CH3AACH2A
117 ('[#7]~*~[#8]', 0) NAO
118 ('[$(*~[CH2]~[CH2]~*),$(*1~[CH2]~[CH2]1)]', 1) ACH2CH2A > 1
119 ('[#7]=*', 0) N=A
120 ('[!#6;R]', 1) Heterocyclic atom > 1 杂环原子大于1
121 ('[#7;R]', 0) N Heterocycle 氮环
122 ('*~[#7](~*)~*', 0) AN(A)A
123 ('[#8]~[#6]~[#8]', 0) OCO
124 ('[!#6;!#1]~[!#6;!#1]', 0) QQ
125 ('?', 0)  Aromatic Ring > 1 芳香环大于1
126 ('*!@[#8]!@*', 0) A!O!A
127 ('*@*!@[#8]', 1) A$A!O > 1
128

('[$(*~[CH2]~*~*~*~[CH2]~*),

$([R]1@[CH2;R]@[R]@[R]@[R]@[CH2;R]1),

$(*~[CH2]~[R]1@[R]@[R]@[CH2;R]1),

$(*~[CH2]~*~[R]1@[R]@[CH2;R]1)]',0)

ACH2AAACH2A
129

('[$(*~[CH2]~*~*~[CH2]~*),

$([R]1@[CH2]@[R]@[R]@[CH2;R]1),

$(*~[CH2]~[R]1@[R]@[CH2;R]1)]',0)

ACH2AACH2A
130 ('[!#6;!#1]~[!#6;!#1]', 1) QQ > 1
131 ('[!#6;!#1;!H0]', 1) QH > 1
132 ('[#8]~*~[CH2]~*', 0) OACH2A
133 ('*@*!@[#7]', 0) A$A!N
134 ('[F,Cl,Br,I]', 0) X (HALOGEN) 卤素
135 ('[#7]!:*:*', 0) Nnot%A%A
136 ('[#8]=*', 1) O=A>1
137 ('[!C;!c;R]', 0) Heterocycle 是否有杂环
138 ('[!#6;!#1]~[CH2]~*', 1) QCH2A>1
139 ('[O;!H0]', 0) OH 氢氧根
140 ('[#8]', 3) O > 3 氧原子大于3个
141 ('[CH3]', 2) CH3 > 2 甲基大于两个
142 ('[#7]', 1) N > 1 氮原子大于1个
143 ('*@*!@[#8]', 0) A$A!O
144 ('*!:*:*!:*', 0) Anot%A%Anot%A
145 ('*1~*~*~*~*~*~1', 1) 6M ring > 1 6元环大于1
146 ('[#8]', 2) O > 2 氧原子大于2个
147 ('[$(*~[CH2]~[CH2]~*),$([R]1@[CH2;R]@[CH2;R]1)]', 0) ACH2CH2A
148 ('*~[!#6;!#1](~*)~*', 0) AQ(A)A
149 ('[C;H3,H4]', 1) CH3 > 1 甲基大于1个
150 ('*!@*@*!@*', 0) A!A$A!A
151 ('[#7;!H0]', 0) NH 亚氨基
152 ('[#8]~[#6](~[#6])~[#6]', 0) OC(C)C
153 ('[!#6;!#1]~[CH2]~*', 0) QCH2A
154 ('[#6]=[#8]', 0) C=O
155 ('*!@[CH2]!@*', 0) A!CH2!A
156 ('[#7]~*(~*)~*', 0) NA(A)A
157 ('[#6]-[#8]', 0) C-O
158 ('[#6]-[#7]', 0) C-N
159 ('[#8]', 1) O>1 氧原子大于1个
160 ('[C;H3,H4]', 0) CH3 甲基
161 ('[#7]', 0) N 氮原子
162 ('a', 0) Aromatic 芳香结构
163 ('*1~*~*~*~*~*~1', 0) 6M Ring 6元环
164 ('[#8]', 0) O 氧原子
165 ('[R]', 0) Ring         有无环
166 ('?', 0) Fragments FIX: this can't be done in SMARTS SMARTS编码下无意义

References:

  1. Durant JL, Leland BA, Henry DR, Nourse JG: Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 2002, 42:1273-1280.
  2. THE KEYS TO UNDERSTANDING MDL KEYSET TECHNOLOGY. https://www.3dsbiovia.com/products/pdf/keys-to-keyset-technology.pdf. Accessed Oct. 2019.
  3. RDKit. https://www.rdkit.org/
  4. O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. J Cheminformatics 2011, 3:33.
  5. The Open Babel Package. https://openbabel.org
  6. Chemistry Development Kit (CDK). Chemistry Development Kit.
  7. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Cherto M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminformatics 2017, 9:33.
本文含有隐藏内容,请 开通VIP 后查看

网站公告


今日签到

点亮在社区的每一天
去签到