Example Aggregations¶
Aggregating data with MDF¶
Searches using Forge.search()
are limited to 10,000 results. However, there are two methods to circumvent this restriction: Forge.aggregate_source()
and Forge.aggregate()
.
[1]:
import json
from mdf_forge.forge import Forge
[2]:
mdf = Forge()
aggregate_source - NIST XPS DB¶
Example: We want to collect all records from the NIST XPS Database and analyze the binding energies. This database has almost 30,000 records, so we have to use aggregate()
.
[3]:
# First, let's aggregate all the nist_xps_db data.
all_entries = mdf.aggregate_sources("nist_xps_db")
print(len(all_entries))
29190
[4]:
# Now, let's parse out the enery_uncertainty_ev and print the results for analysis.
uncertainties = {}
for record in all_entries:
if record["mdf"]["resource_type"] == "record":
unc = record.get("nist_xps_db_v1", {}).get("energy_uncertainty_ev", 0)
if not uncertainties.get(unc):
uncertainties[unc] = 1
else:
uncertainties[unc] += 1
print(json.dumps(uncertainties, sort_keys=True, indent=4, separators=(',', ': ')))
{
"0": 29189
}
aggregate - Multiple Datasets¶
Example: We want to analyze how often elements are studied with Gallium (Ga), and what the most frequent elemental pairing is. There are more than 10,000 records containing Gallium data.
[5]:
# First, let's aggregate everything that has "Ga" in the list of elements.
all_results = mdf.aggregate("material.elements:Ga")
print(len(all_results))
18232
[6]:
# Now, let's parse out the other elements in each record and keep a running tally to print out.
elements = {}
for record in all_results:
if record["mdf"]["resource_type"] == "record":
elems = record["material"]["elements"]
for elem in elems:
if elem in elements.keys():
elements[elem] += 1
else:
elements[elem] = 1
print(json.dumps(elements, sort_keys=True, indent=4, separators=(',', ': ')))
{
"Ac": 267,
"Ag": 323,
"Al": 322,
"Ar": 2,
"As": 872,
"Au": 372,
"B": 301,
"Ba": 342,
"Be": 281,
"Bi": 4172,
"Br": 38,
"C": 87,
"Ca": 370,
"Cd": 174,
"Ce": 325,
"Cl": 57,
"Co": 381,
"Cr": 315,
"Cs": 160,
"Cu": 403,
"Dy": 317,
"Er": 321,
"Eu": 304,
"F": 84,
"Fe": 2989,
"Ga": 18232,
"Gd": 156,
"Ge": 333,
"H": 159,
"Hf": 310,
"Hg": 282,
"Ho": 323,
"I": 41,
"In": 364,
"Ir": 305,
"K": 313,
"La": 312,
"Li": 469,
"Lu": 291,
"Mg": 683,
"Mn": 4357,
"Mo": 437,
"N": 137,
"Na": 339,
"Nb": 296,
"Nd": 179,
"Ni": 363,
"Np": 252,
"O": 1390,
"On": 6,
"Os": 288,
"Ox": 39,
"P": 153,
"Pa": 272,
"Pb": 278,
"Pd": 361,
"Pm": 273,
"Pr": 312,
"Pt": 338,
"Pu": 280,
"Rb": 163,
"Re": 134,
"Rh": 320,
"Ru": 304,
"S": 161,
"Sb": 327,
"Sc": 331,
"Se": 138,
"Si": 412,
"Sm": 330,
"Sn": 303,
"Sr": 221,
"Ta": 160,
"Tb": 174,
"Tc": 139,
"Te": 361,
"Th": 287,
"Ti": 211,
"Tl": 295,
"Tm": 312,
"U": 223,
"V": 1646,
"Va": 2,
"W": 259,
"Xe": 1,
"Y": 332,
"Yb": 324,
"Zn": 315,
"Zr": 167
}
[ ]: