Docs Menu
Docs Home
/
Atlas
/

Analyze Your Data Schema

The Schema tab provides an overview of the data type and shape of the fields in a particular collection. Databases and collections are visible in the left-side navigation.

The overview is based on sampling the documents in the collection. The schema overview may include additional data about the contents of the fields, such as the minimum and maximum values of dates and integers, the frequency of occurrence of particular values, and the cardinality of the data.

MongoDB has a flexible schema model, which means that some fields may contain different types of data from one document to the next. For example, a field named address may contain strings and integers in some documents, objects in others, or some combination of all three.

In the case of heterogenous fields, the Schema tab shows a breakdown of the various data types contained within the field with the percentage of each data type represented.

Example

The Schema tab shows size information about the test.restaurants collection at the top, including the total number of documents in the collection, the average document size, and the total disk space occupied by the collection.

The following fields are shown with details:

  • The _id field is an ObjectId. Each ObjectId contains a timestamp, so Atlas displays the range of creation times for the sampled documents.

  • The address field contains four nested fields. You can expand the field panel to see analyses of each of the nested fields.

  • The borough field contains a string indicating the borough in which the restaurant is located. The cardinality is low enough that Atlas can provide a graded bar of the field contents, with the most-frequently occurring string on the left.

  • The grades field contains arrays of strings. The analysis shows the minimum, maximum, and average array lengths.

Example of a collection's schema
click to enlarge

Using the query bar in the Schema tab, you can create a query filter to limit your result set. Click the Options button to specify query options, such as the particular fields to display and the number of results to return.

Note

For query result sets larger than 1000 documents, Atlas shows a subset of the results. Otherwise, Atlas shows the entire result set.

For details on sampling, see Sampling.

Query bar schema view
click to enlarge

Tip

In the Schema tab, you can also use the Query Builder to enter a query into the query bar.

For each field, Atlas displays summary information about the data type or types the field contains and the range of values. Depending on the data type and the level of cardinality, Atlas displays histograms, graded bars, geographical maps, and sample data to provide a sense of the shape and scope of the data contained in each field.

Below is an example of the data type summary for a field called last_login which contains data of type date.

Example of a field with a single data type
click to enlarge

For fields that contain multiple data types, Atlas displays a percentage breakdown of the various data types across documents. In the example below, the chart shows the contents of a field called phone_no in which 20% of documents are of type int32, and the remaining 80% are of type string.

Example of percentage breakdown for data types
click to enlarge

If a collection contains documents in which not all fields contain a value, the missing values display as undefined. In the example below, the field age has no recorded value in 40% of the sampled documents.

Example of sparcely applied data type
click to enlarge

Strings can appear in three different ways. If there are entirely unique strings in a field, Atlas shows a random selection of string values from the specified field. Click the circular refresh icon to see a new set of randomly selected values from the field.

Example of string data types
click to enlarge

If there are only a few different string values, Atlas shows the strings in a single graded bar which shows the percentage of the population of the string values.

Example of few string data types
click to enlarge

If there are multiple string values with some duplicates, Atlas shows a histogram indicating the frequency of each string found within the field.

Example of string data types as a histogram
click to enlarge

Note

Move the mouse over each bar to display a tooltip which shows the value of the string.

Numbers are similar to strings in their representation. Unique numbers are shown in the following manner:

Example of number data type
click to enlarge

Duplicate numbers are shown in a histogram that indicates their frequency:

Example of duplicate number data types
click to enlarge

Fields that represent dates (and fields that contain the ObjectID data type, which includes a timestamp) are shown across multiple bar charts. The two charts on the top row represent the day of the week and time of day of the timestamp value.

The single chart on the bottom shows the first and last timestamp value, and the vertical lines represent the distribution of the timestamp across the range of first to last.

Example of Date data types
click to enlarge

Fields that contain a sub-document or an array are displayed with a small triangle next to them and a visual representation of the data contained within the sub-document or array.

Example of fields with embedded documents or arrays
click to enlarge

Click on the triangle to expand the field and view the embedded documents:

Expanding the embedded documents
click to enlarge

If a field has mixed types, you can view different charts of each type by clicking on the type field. In the example below, the age field shows the values that are strings:

Example of a field with mixed types
click to enlarge

Clicking on the int32 type causes the chart to show its numeric data:

Example that shows numeric data for number type
click to enlarge

In the Schema tab, you can type the filter manually into the query bar or generate the filter with the Atlas query builder. The query builder allows you to select data elements from one or more fields in your schema and construct a query matching the selected elements.

Tip

You can compose the initial query filter by using the clickable query builder and then manually edit the generated filter to your exact requirements.

The following procedure describes the steps involved in building a complex query with the query bar.

1

In the Schema view, you can click on a chart value to build a query. For example, the following image shows the query filter built by clicking the Manhattan value for the borough field.

Example of a created filter
click to enlarge
2

To select multiple values for a field, click and drag the cursor over a selection of values, or press shift+click on the desired values.

Exmaple of selecting multimple values for a field
click to enlarge
3

For example, the following image shows shows the compound query built by selecting values in the cuisine field.

Example of a compound query
click to enlarge
4

To deselect a previously selected value, shift+click on the selected value:

Example of removing a value from a filter
click to enlarge
5

To run the query, click Analyze. Click Reset to clear your query.

If the analysis of your schema times out, it might be because the collection you are analyzing is very large, causing MongoDB to stop the operation before the analysis is complete. Increase the value of MAX TIME MS to allow the operation time to complete.

To increase the value of MAX TIME MS:

  1. In the query bar, expand Options.

    The Options button is on the right side of the query bar,
next to the Analyze button.
  2. Increase the value of MAX TIME MS to accommodate your collection. MAX TIME MS defaults to 60000 milliseconds, or 60 seconds, but large collections might take tens of seconds to analyze.

Once you have increased the value of MAX TIME MS, retry your schema analysis by clicking Analyze.

Back

Builder Settings

On this page