Getting tables from the database

This short tutorial explains how to retrieve full tables from the database into pandas DataFrames.

The following tables are available from mendeleev

  • elements
  • ionicradii
  • ionizationenergies
  • oxidationstates
  • groups
  • series
  • isotopes

mendeleev provides a convenient function get_table to perform the task at hand. The function can be directly imported from mendeleev

In [4]:
from mendeleev import get_table

To retrieve a table call the get_table with the table name as argument. Here we’ll get probably the most important table elements with basis data on each element

In [5]:
ptable = get_table('elements')

Now we can use pandas’ capabilities to work with the data.

In [6]:
ptable.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 67 columns):
annotation                       118 non-null object
atomic_number                    118 non-null int64
atomic_radius                    88 non-null float64
atomic_volume                    91 non-null float64
block                            118 non-null object
boiling_point                    96 non-null float64
density                          95 non-null float64
description                      109 non-null object
dipole_polarizability            106 non-null float64
electron_affinity                77 non-null float64
electronic_configuration         118 non-null object
evaporation_heat                 88 non-null float64
fusion_heat                      75 non-null float64
group_id                         90 non-null float64
lattice_constant                 87 non-null float64
lattice_structure                91 non-null object
melting_point                    100 non-null float64
name                             118 non-null object
period                           118 non-null int64
series_id                        118 non-null int64
specific_heat                    81 non-null float64
symbol                           118 non-null object
thermal_conductivity             66 non-null float64
vdw_radius                       103 non-null float64
covalent_radius_cordero          96 non-null float64
covalent_radius_pyykko           118 non-null float64
en_pauling                       85 non-null float64
en_allen                         71 non-null float64
jmol_color                       109 non-null object
cpk_color                        103 non-null object
proton_affinity                  32 non-null float64
gas_basicity                     32 non-null float64
heat_of_formation                89 non-null float64
c6                               43 non-null float64
covalent_radius_bragg            37 non-null float64
covalent_radius_slater           86 non-null float64
vdw_radius_bondi                 28 non-null float64
vdw_radius_truhlar               16 non-null float64
vdw_radius_rt                    9 non-null float64
vdw_radius_batsanov              65 non-null float64
vdw_radius_dreiding              21 non-null float64
vdw_radius_uff                   103 non-null float64
vdw_radius_mm3                   94 non-null float64
abundance_crust                  88 non-null float64
abundance_sea                    81 non-null float64
molcas_gv_color                  103 non-null object
en_ghosh                         103 non-null float64
vdw_radius_alvarez               94 non-null float64
c6_gb                            86 non-null float64
atomic_weight                    118 non-null float64
atomic_weight_uncertainty        74 non-null float64
is_monoisotopic                  21 non-null object
is_radioactive                   118 non-null bool
cas                              118 non-null object
atomic_radius_rahm               96 non-null float64
geochemical_class                76 non-null object
goldschmidt_class                118 non-null object
metallic_radius                  56 non-null float64
metallic_radius_c12              63 non-null float64
covalent_radius_pyykko_double    108 non-null float64
covalent_radius_pyykko_triple    80 non-null float64
discoverers                      118 non-null object
discovery_year                   105 non-null float64
discovery_location               106 non-null object
name_origin                      118 non-null object
sources                          118 non-null object
uses                             112 non-null object
dtypes: bool(1), float64(44), int64(3), object(19)
memory usage: 61.0+ KB

For clarity let’s take only a subset of columns

In [7]:
cols = ['atomic_number', 'symbol', 'atomic_radius', 'en_pauling', 'block', 'vdw_radius_mm3']
In [8]:
ptable[cols].head()
Out[8]:
atomic_number symbol atomic_radius en_pauling block vdw_radius_mm3
0 1 H 79.0 2.20 s 162.0
1 2 He NaN NaN s 153.0
2 3 Li 155.0 0.98 s 255.0
3 4 Be 112.0 1.57 s 223.0
4 5 B 98.0 2.04 p 215.0

It is quite easy now to get descriptive statistics on the data.

In [9]:
ptable[cols].describe()
Out[9]:
atomic_number atomic_radius en_pauling vdw_radius_mm3
count 118.000000 88.000000 85.000000 94.000000
mean 59.500000 169.397727 1.748588 248.468085
std 34.207699 49.810108 0.634442 36.017828
min 1.000000 79.000000 0.700000 153.000000
25% 30.250000 137.000000 1.240000 229.000000
50% 59.500000 160.000000 1.700000 244.000000
75% 88.750000 181.000000 2.160000 269.250000
max 118.000000 299.000000 3.980000 364.000000

Isotopes table

Let try and retrieve another table, namely isotopes

In [10]:
isotopes = get_table('isotopes', index_col='id')
In [11]:
isotopes.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 406 entries, 1 to 406
Data columns (total 11 columns):
atomic_number        406 non-null int64
mass                 377 non-null float64
abundance            288 non-null float64
mass_number          406 non-null int64
mass_uncertainty     377 non-null float64
is_radioactive       406 non-null bool
half_life            121 non-null float64
half_life_unit       85 non-null object
spin                 323 non-null float64
g_factor             323 non-null float64
quadrupole_moment    320 non-null float64
dtypes: bool(1), float64(7), int64(2), object(1)
memory usage: 35.3+ KB

Merge the elements table with the isotopes

We can now perform SQL-like merge operation on two DataFrames and produce an outer join

In [12]:
import pandas as pd
In [13]:
merged = pd.merge(ptable[cols], isotopes, how='outer', on='atomic_number')

now we have the following columns in the merged DataFrame

In [14]:
merged.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 406 entries, 0 to 405
Data columns (total 16 columns):
atomic_number        406 non-null int64
symbol               406 non-null object
atomic_radius        325 non-null float64
en_pauling           313 non-null float64
block                406 non-null object
vdw_radius_mm3       350 non-null float64
mass                 377 non-null float64
abundance            288 non-null float64
mass_number          406 non-null int64
mass_uncertainty     377 non-null float64
is_radioactive       406 non-null bool
half_life            121 non-null float64
half_life_unit       85 non-null object
spin                 323 non-null float64
g_factor             323 non-null float64
quadrupole_moment    320 non-null float64
dtypes: bool(1), float64(10), int64(2), object(3)
memory usage: 51.1+ KB
In [15]:
merged.head()
Out[15]:
atomic_number symbol atomic_radius en_pauling block vdw_radius_mm3 mass abundance mass_number mass_uncertainty is_radioactive half_life half_life_unit spin g_factor quadrupole_moment
0 1 H 79.0 2.2 s 162.0 1.007825 0.999720 1 6.000000e-10 False NaN None 0.5 5.585695 0.00000
1 1 H 79.0 2.2 s 162.0 2.014102 0.000280 2 8.000000e-10 False NaN None 1.0 0.857438 0.00286
2 1 H 79.0 2.2 s 162.0 NaN NaN 3 NaN True NaN None 0.5 5.957994 0.00000
3 2 He NaN NaN s 153.0 3.016029 0.000002 3 2.000000e-08 False NaN None 0.5 -4.254995 0.00000
4 2 He NaN NaN s 153.0 4.002603 0.999998 4 4.000000e-10 False NaN None 0.0 0.000000 0.00000

To display all the isotopes of Silicon

In [16]:
merged[merged['symbol'] == 'Si']
Out[16]:
atomic_number symbol atomic_radius en_pauling block vdw_radius_mm3 mass abundance mass_number mass_uncertainty is_radioactive half_life half_life_unit spin g_factor quadrupole_moment
28 14 Si 132.0 1.9 p 229.0 27.976927 0.92191 28 3.000000e-09 False NaN None 0.0 0.00000 0.0
29 14 Si 132.0 1.9 p 229.0 28.976495 0.04699 29 3.000000e-09 False NaN None 0.5 -1.11058 0.0
30 14 Si 132.0 1.9 p 229.0 29.973770 0.03110 30 2.000000e-08 False NaN None 0.0 0.00000 0.0
In [17]:
%version_information mendeleev, sqlalchemy, pandas
Out[17]:
SoftwareVersion
Python3.6.3 64bit [GCC 7.2.0]
IPython6.2.1
OSLinux 4.9.0 4 amd64 x86_64 with debian 9.1
mendeleev0.3.6
sqlalchemy1.1.13
pandas0.20.3
Wed Nov 01 15:15:51 2017 CET