{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4 - General Helper Functions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from mdf_forge.forge import Forge" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "mdf = Forge()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generally Useful Help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### current_query\n", "You can see the query you're currently building with `current_query()`.\n", "\n", "Note that your query may be enclosed in parentheses automatically. This does not alter the results of the query." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(mdf.source_name:oqmd)'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.match_field(\"mdf.source_name\", \"oqmd\")\n", "mdf.current_query()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### reset_query\n", "If you have a query in memory that you don't want, you can use `reset_query()` to start a new query. This method will clear the current query entirely." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "mdf.reset_query()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.current_query()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query info\n", "We can build a query using `exclude_field()` and `match_field()` and execute it with `search()`. But if you are interested in knowing more about the query, including the actual query string that was made, you can use the `info=True` argument to `search()`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "mdf.exclude_field(\"mdf.source_name\", \"sluschi\").match_field(\"material.elements\", \"Al\").exclude_field(\"mdf.source_name\", \"oqmd\")\n", "res, info = mdf.search(limit=10, info=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you use the `info=True` argument, `search()` will return a tuple instead of a list. The first element in the tuple will be the same list of results you're used to, but the second tuple element will be a dictionary of query info." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'crystal_structure': {'number_of_atoms': 108.0,\n", " 'space_group_number': 225,\n", " 'stoichiometry': 'A',\n", " 'volume': 1779.162},\n", " 'files': [{'data_type': 'ASCII text',\n", " 'filename': 'INCAR',\n", " 'globus': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/INCAR',\n", " 'length': 169,\n", " 'mime_type': 'text/plain',\n", " 'sha512': 'da3b28318b6c8496dda80d81f89176edc55997c2b75dafbcf92fdd8bb6c30d0dc27d2c3cfff8383e541ccd85bd84629da60b1e68da6411b5974f31bd85de0f8a',\n", " 'url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/INCAR'},\n", " {'data_type': 'ASCII text',\n", " 'filename': 'CONTCAR',\n", " 'globus': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/CONTCAR',\n", " 'length': 3348,\n", " 'mime_type': 'text/plain',\n", " 'sha512': '613498249ad2d01dc3cb4aa37a41bd63ba2c95a599ab0b0cb43f4e30c3f5381a8c2a1404d3c20d24da434b0785189bb603859a99f5abbcc242cc87cc9c3e0ca8',\n", " 'url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/CONTCAR'},\n", " {'data_type': 'ASCII text',\n", " 'filename': 'KPOINTS',\n", " 'globus': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/KPOINTS',\n", " 'length': 42,\n", " 'mime_type': 'text/plain',\n", " 'sha512': '56f819a7cff23127409c48d69cef684f578600b02b9727fc3ab46aa297bb201b890dabcfc4b232088fb4d0cb938514283986f8eae68a85bdf303960d2f9058dd',\n", " 'url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/KPOINTS'},\n", " {'data_type': 'ASCII text',\n", " 'filename': 'POSCAR',\n", " 'globus': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/POSCAR',\n", " 'length': 3348,\n", " 'mime_type': 'text/plain',\n", " 'sha512': '613498249ad2d01dc3cb4aa37a41bd63ba2c95a599ab0b0cb43f4e30c3f5381a8c2a1404d3c20d24da434b0785189bb603859a99f5abbcc242cc87cc9c3e0ca8',\n", " 'url': 'https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/MDF/mdf_connect/prod/data/ab_initio_solute_database_v1-2/data/FCC_solute_AlCu_20140918T204831/perfect_stat/POSCAR'}],\n", " 'material': {'composition': 'Al108', 'elements': ['Al']},\n", " 'mdf': {'ingest_date': '2018-11-24T08:12:11.852893Z',\n", " 'resource_type': 'record',\n", " 'scroll_id': 28093,\n", " 'source_id': 'ab_initio_solute_database_v1.2',\n", " 'source_name': 'ab_initio_solute_database',\n", " 'version': 1}}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[0]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'advanced': True,\n", " 'errors': [],\n", " 'index_uuid': '1a57bbe5-5272-477f-9d31-343b8258b7a5',\n", " 'limit': 10,\n", " 'query': '( NOT mdf.source_name:sluschi AND material.elements:Al AND NOT mdf.source_name:oqmd)',\n", " 'retries': 0,\n", " 'total_query_matches': 14886}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "info" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Repeat a query\n", "You can stop a query from being cleared out of memory after a search by using the `reset_query=False` argument." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.match_field(\"mdf.source_name\", \"nist_xps_db\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(mdf.source_name:nist_xps_db)'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res, info = mdf.search(limit=10, info=True, reset_query=False)\n", "info[\"query\"]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(mdf.source_name:nist_xps_db)'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res, info = mdf.search(limit=10, info=True)\n", "info[\"query\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### show_fields\n", "How do you know what fields there are to search on? Use `show_fields()` to find out. If you just call `show_fields()` by itself, it will show you all fields currently in the MDF Search index." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'calphad.phases': 'text',\n", " 'cip.bv': 'text',\n", " 'cip.energy': 'text',\n", " 'cip.forcefield': 'text',\n", " 'cip.gv': 'text',\n", " 'cip.mpid': 'text',\n", " 'cip.totenergy': 'text',\n", " 'crystal_structure.cross_reference.icsd': 'long',\n", " 'crystal_structure.number_of_atoms': 'long',\n", " 'crystal_structure.space_group_number': 'long',\n", " 'crystal_structure.stoichiometry': 'text',\n", " 'crystal_structure.volume': 'float',\n", " 'custom.all_materials_included': 'text',\n", " 'custom.atom_fractions': 'text',\n", " 'custom.experiment_holding_temperature': 'text',\n", " 'custom.experiment_nominal_alloy_composition': 'text',\n", " 'custom.experiment_pixel_size': 'text',\n", " 'custom.experiment_time_between_scans': 'text',\n", " 'custom.experiment_total_duration': 'text',\n", " 'custom.experiment_xray_energy': 'text',\n", " 'custom.funding_details': 'text',\n", " 'custom.plate_id': 'text',\n", " 'custom.processing_reconstruction_method': 'text',\n", " 'custom.processing_segmentation_method': 'text',\n", " 'custom.reduction_method': 'text',\n", " 'custom.sample_id': 'text',\n", " 'data.endpoint_path': 'text',\n", " 'data.link': 'text',\n", " 'dc.alternateIdentifiers.alternateIdentifier': 'text',\n", " 'dc.alternateIdentifiers.alternateIdentifierType': 'text',\n", " 'dc.contributors.affiliations': 'text',\n", " 'dc.contributors.contributorName': 'text',\n", " 'dc.contributors.contributorType': 'text',\n", " 'dc.contributors.familyName': 'text',\n", " 'dc.contributors.givenName': 'text',\n", " 'dc.creators.affiliations': 'text',\n", " 'dc.creators.creatorName': 'text',\n", " 'dc.creators.familyName': 'text',\n", " 'dc.creators.givenName': 'text',\n", " 'dc.dates.date': 'date',\n", " 'dc.dates.dateType': 'text',\n", " 'dc.descriptions.description': 'text',\n", " 'dc.descriptions.descriptionType': 'text',\n", " 'dc.geoLocations.geoLocationPlace': 'text',\n", " 'dc.identifier.identifier': 'text',\n", " 'dc.identifier.identifierType': 'text',\n", " 'dc.publicationYear': 'text',\n", " 'dc.publisher': 'text',\n", " 'dc.relatedIdentifiers.relatedIdentifier': 'text',\n", " 'dc.relatedIdentifiers.relatedIdentifierType': 'text',\n", " 'dc.relatedIdentifiers.relationType': 'text',\n", " 'dc.resourceType.resourceType': 'text',\n", " 'dc.resourceType.resourceTypeGeneral': 'text',\n", " 'dc.rightsList.rights': 'text',\n", " 'dc.rightsList.rightsURI': 'text',\n", " 'dc.subjects.subject': 'text',\n", " 'dc.titles.title': 'text',\n", " 'dft.converged': 'text',\n", " 'dft.cutoff_energy': 'float',\n", " 'dft.exchange_correlation_functional': 'text',\n", " 'electron_microscopy.beam_energy': 'float',\n", " 'files.data_type': 'text',\n", " 'files.filename': 'text',\n", " 'files.globus': 'text',\n", " 'files.length': 'long',\n", " 'files.mime_type': 'text',\n", " 'files.sha512': 'text',\n", " 'files.url': 'text',\n", " 'image.format': 'text',\n", " 'image.height': 'long',\n", " 'image.megapixels': 'float',\n", " 'image.width': 'long',\n", " 'jarvis.__custom.band_gap_desc': 'text',\n", " 'jarvis.__custom.crossreference_desc': 'text',\n", " 'jarvis.__custom.dimensionality_desc': 'text',\n", " 'jarvis.__custom.elastic_moduli_desc': 'text',\n", " 'jarvis.__custom.formation_enthalpy_desc': 'text',\n", " 'jarvis.__custom.id_desc': 'text',\n", " 'jarvis.__custom.landing_page_desc': 'text',\n", " 'jarvis.__custom.total_energy_desc': 'text',\n", " 'jarvis.bandgap.mbj': 'float',\n", " 'jarvis.bandgap.optb88vdw': 'float',\n", " 'jarvis.crossreference.materials_project': 'text',\n", " 'jarvis.dimensionality': 'text',\n", " 'jarvis.elastic_moduli.bulk': 'float',\n", " 'jarvis.elastic_moduli.shear': 'float',\n", " 'jarvis.formation_enthalpy': 'float',\n", " 'jarvis.id': 'text',\n", " 'jarvis.landing_page': 'text',\n", " 'jarvis.total_energy': 'float',\n", " 'material.composition': 'text',\n", " 'material.elements': 'text',\n", " 'mdf.ingest_date': 'date',\n", " 'mdf.mdf_id': 'text',\n", " 'mdf.organizations': 'text',\n", " 'mdf.resource_type': 'text',\n", " 'mdf.scroll_id': 'long',\n", " 'mdf.source_id': 'text',\n", " 'mdf.source_name': 'text',\n", " 'mdf.version': 'long',\n", " 'mrr.characterizationMethod': 'text',\n", " 'mrr.materialType': 'text',\n", " 'mrr.structuralFeature': 'text',\n", " 'nist_xps_db.binding_energy_ev': 'text',\n", " 'nist_xps_db.energy_uncertainty_ev': 'text',\n", " 'nist_xps_db.notes': 'text',\n", " 'nist_xps_db.temperature_k': 'text',\n", " 'oqmd.__custom.band_gap_desc': 'text',\n", " 'oqmd.__custom.configuration_desc': 'text',\n", " 'oqmd.__custom.delta_e_desc': 'text',\n", " 'oqmd.__custom.magnetic_moment_desc': 'text',\n", " 'oqmd.__custom.stability_desc': 'text',\n", " 'oqmd.__custom.total_energy_desc': 'text',\n", " 'oqmd.__custom.volume_pa_desc': 'text',\n", " 'oqmd.band_gap.units': 'text',\n", " 'oqmd.band_gap.value': 'float',\n", " 'oqmd.configuration': 'text',\n", " 'oqmd.delta_e.units': 'text',\n", " 'oqmd.delta_e.value': 'float',\n", " 'oqmd.magnetic_moment.units': 'text',\n", " 'oqmd.magnetic_moment.value': 'float',\n", " 'oqmd.stability.units': 'text',\n", " 'oqmd.stability.value': 'float',\n", " 'oqmd.total_energy.units': 'text',\n", " 'oqmd.total_energy.value': 'float',\n", " 'oqmd.volume_pa.units': 'text',\n", " 'oqmd.volume_pa.value': 'float',\n", " 'origin.creator': 'text',\n", " 'origin.name': 'text',\n", " 'origin.type': 'text',\n", " 'services.citrine': 'text',\n", " 'services.mdf_publish': 'text',\n", " 'services.mdf_search': 'text',\n", " 'services.mrr': 'text'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.show_fields()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you give `show_fields()` a top-level block, it will show you the mapping for that block, including the expected datatypes." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'mdf.ingest_date': 'date',\n", " 'mdf.mdf_id': 'text',\n", " 'mdf.organizations': 'text',\n", " 'mdf.resource_type': 'text',\n", " 'mdf.scroll_id': 'long',\n", " 'mdf.source_id': 'text',\n", " 'mdf.source_name': 'text',\n", " 'mdf.version': 'long'}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.show_fields(\"mdf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### describe_field\n", "To learn more about specific fields, use `describe_field()`. This method can tell you what a field means, what unit of measurement it uses, or other useful information. When you call `describe_field()`, you must pass in the `resource_type` you're interested in (such as `dataset` or `record`). Since the full schema for a `resource_type` is very long, you can also pass in a `field` you're interested in, in the standard dot notation (if you don't, you will get the full schema for the `resource_type` instead)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "- acl (array of string): The IDs of users or groups allowed to view this entry (or [\"public\"]). Note that this field does not appear in Search results for security reasons.\n", " Must have at least 1 item(s)\n", "\n", "- ingest_date (string): The RFC 3339 date of ingest.\n", "\n", "- mdf_id (string): The BSON ID of the entry, which is not static between dataset versions.\n", "\n", "- organizations (array of string): The organizations associated with the dataset.\n", "\n", "- parent_id (string): The BSON ID of the entry's parent.\n", "\n", "- resource_type (string): The type of entry.\n", "\n", "- scroll_id (integer): A number to enable aggregating (via simulated scrolling) in Forge.\n", "\n", "- source_id (string): A unique (globally) identifier for the dataset.\n", "\n", "- source_name (string): A unique (to this dataset) program-friendly name for the dataset.\n", "\n", "- version (integer): The version number for the dataset.\n", "\n", "Required: ['source_name', 'source_id', 'mdf_id', 'acl', 'ingest_date', 'resource_type']\n", "\n" ] } ], "source": [ "mdf.describe_field(\"dataset\", field=\"mdf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want your results in a dictionary instead of being printed out, you can set `raw=True`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'error': None,\n", " 'schema': {'description': 'A unique (to this dataset) program-friendly name for the dataset.',\n", " 'type': 'string'},\n", " 'status_code': 200,\n", " 'success': True}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.describe_field(\"record\", field=\"mdf.source_name\", raw=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### describe_organization\n", "To learn more about an organization registered with MDF, use `describe_organization()`. This method can tell you more about an organization, including the provided description, homepage, and submission rules. When you call `describe_organization()`, you just pass in the name or alias of an organization (capitalization doesn't matter)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Argonne National Laboratory\n", "\taliases: ANL\n", "\tcanonical_name: Argonne National Laboratory\n", "\tdescription: Argonne serves America as a science and energy laboratory distinguished by the breadth of our R&D capabilities in concert with our powerful suite of experimental and computational facilities.\n", "\thomepage: https://www.anl.gov/\n", "\tparent_organizations: None\n", "\tpermission_groups: public\n" ] } ], "source": [ "mdf.describe_organization(\"argonne national laboratory\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Center for Hierarchical Materials Design\n", "\taliases: CHiMaD\n", "\tcanonical_name: Center for Hierarchical Materials Design\n", "\tdescription: Center for Hierarchical Materials Design (CHiMaD) is a NIST-sponsored center of excellence for advanced materials research focusing on developing the next generation of computational tools, databases and experimental techniques in order to enable the accelerated design of novel materials and their integration to industry, one of the primary goals of the U.S. Government's Materials Genome Initiative (MGI).\n", "\thomepage: http://chimad.northwestern.edu/\n", "\tparent_organizations: National Institute of Standards and Technology\n", "\tpermission_groups: public\n" ] } ], "source": [ "mdf.describe_organization(\"CHiMaD\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get a brief overview of an organization without the technical details by setting `summary=True`. `describe_organization()` also supports the `raw` argument to get results back as a dictionary (`raw` overrides `summary`)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " National Institute of Standards and Technology\n", "\taliases: NIST\n", "\tdescription: The National Institute of Standards and Technology (NIST) was founded in 1901 and is now part of the U.S. Department of Commerce. NIST is one of the nation's oldest physical science laboratories.\n", "\thomepage: https://www.nist.gov/\n" ] } ], "source": [ "mdf.describe_organization(\"NIST\", summary=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'error': None,\n", " 'organization': {'aliases': ['NIST MDR', 'MDR'],\n", " 'canonical_name': 'NIST Materials Data Repository',\n", " 'description': 'The National Institute of Standards and Technology has created a materials science data repository as part of an effort in coordination with the Materials Genome Initiative (MGI) to establish data exchange protocols and mechanisms that will foster data sharing and reuse across a wide community of researchers, with the goal of enhancing the quality of materials data and models.',\n", " 'homepage': 'https://materialsdata.nist.gov/',\n", " 'parent_organizations': ['National Institute of Standards and Technology'],\n", " 'permission_groups': ['public']},\n", " 'status_code': 200,\n", " 'success': True}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf.describe_organization(\"NIST MDR\", raw=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching Datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fetch_datasets_from_results\n", "This method allows you to automatically collect all the datasets that have records returned from a search. In other words, if you search for `mdf.elements:Al` and a _record_ from OQMD is returned, you can pass that record to `fetch_datasets_from_results()` and get the OQMD _dataset_ entry back." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "records = mdf.search(\"dft.converged:true AND mdf.resource_type:record\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'data': {'endpoint_path': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/mdr_item_775_v1/',\n", " 'link': 'https://app.globus.org/file-manager?origin_id=e38ee745-6d04-11e5-ba46-22000b92c6ec&origin_path=/MDF/mdf_connect/prod/data/mdr_item_775_v1/'},\n", " 'dc': {'alternateIdentifiers': [{'alternateIdentifier': 'http://hdl.handle.net/11115/166',\n", " 'alternateIdentifierType': 'Handle'},\n", " {'alternateIdentifier': '775',\n", " 'alternateIdentifierType': 'NIST DSpace ID'}],\n", " 'creators': [{'creatorName': 'Valencia and P.N. Quested, J.J.',\n", " 'familyName': 'Valencia and P.N. Quested',\n", " 'givenName': 'J.J.'}],\n", " 'publicationYear': '2013',\n", " 'publisher': 'NIST Materials Data Repository',\n", " 'resourceType': {'resourceType': 'Dataset',\n", " 'resourceTypeGeneral': 'Dataset'},\n", " 'titles': [{'title': 'Thermophysical Properties'}]},\n", " 'mdf': {'ingest_date': '2018-11-15T19:09:44.202046Z',\n", " 'organizations': ['National Institute of Standards and Technology',\n", " 'U.S. Department of Commerce',\n", " 'DOC',\n", " 'MDR',\n", " 'NIST',\n", " 'NIST Materials Data Repository',\n", " 'NIST MDR'],\n", " 'resource_type': 'dataset',\n", " 'scroll_id': 0,\n", " 'source_id': 'mdr_item_775_v1.1',\n", " 'source_name': 'mdr_item_775',\n", " 'version': 1},\n", " 'services': {'mdf_search': 'This dataset was ingested to MDF Search.',\n", " 'mrr': 'This dataset was registered with the MRR.'}}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mdf.fetch_datasets_from_results(records)\n", "res[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you don't want to keep the results at all, you can also use `fetch_datasets_from_results()` to execute a search and use those results instead of passing it your own results." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'data': {'endpoint_path': 'globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/MDF/mdf_connect/prod/data/schleife_al_channel_v1-1/',\n", " 'link': 'https://app.globus.org/file-manager?origin_id=e38ee745-6d04-11e5-ba46-22000b92c6ec&origin_path=/MDF/mdf_connect/prod/data/schleife_al_channel_v1-1/'},\n", " 'dc': {'contributors': [{'affiliations': ['University of Illinois Urbana-Champaign'],\n", " 'contributorName': 'Schleife, Andre',\n", " 'contributorType': 'ContactPerson',\n", " 'familyName': 'Schleife',\n", " 'givenName': 'Andre'}],\n", " 'creators': [{'affiliations': ['University of Illinois Urbana-Champaign'],\n", " 'creatorName': 'Schleife, Andre',\n", " 'familyName': 'Schleife',\n", " 'givenName': 'Andre'}],\n", " 'dates': [{'date': '2017-10-10T15:45:40.065761Z', 'dateType': 'Collected'}],\n", " 'publicationYear': '2015',\n", " 'publisher': 'MDF (placeholder)',\n", " 'resourceType': {'resourceType': 'JSON', 'resourceTypeGeneral': 'Dataset'},\n", " 'subjects': [{'subject': 'data_link'}],\n", " 'titles': [{'title': 'Schleife Al 256 Channel'}]},\n", " 'mdf': {'ingest_date': '2018-11-30T21:04:03.431302Z',\n", " 'resource_type': 'dataset',\n", " 'scroll_id': 0,\n", " 'source_id': 'schleife_al_channel_v1.1',\n", " 'source_name': 'schleife_al_channel',\n", " 'version': 1},\n", " 'services': {'mdf_search': 'This dataset was ingested to MDF Search.'}}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mdf.match_field(\"material.elements\", \"Al\").fetch_datasets_from_results()\n", "res[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### aggregate\n", "Queries submitted with `search()` are limited to returning 10,000 results. If this limit is too low, you can use `aggregate()` to retrieve _all_ results from a query, no matter how many. Please be careful with this function, as you can easily accidentally retrieve a very large number of results without meaning to. Consider using `search(your_query, limit=0, info=True)` first to discover how many results you will get beforehand (see [Query info](#Query-info) above for more information)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example, we will see how many results the query will retrieve before aggregating." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of results: 15057\n" ] } ], "source": [ "mdf.match_field(\"mdf.source_name\", \"oqmd*\").match_field(\"material.elements\", \"Pb\").exclude_field(\"material.elements\", \"Al\")\n", "res, info = mdf.search(limit=0, info=True, reset_query=False)\n", "print(\"Number of results:\", info[\"total_query_matches\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming we want all of these results, we can use `aggregate()` on the same query." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of results: 15057\n" ] } ], "source": [ "res = mdf.aggregate()\n", "print(\"Number of results:\", len(res))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }