- Author: Olivia Ildefonso
- Full Title: Top Mapping Mistakes
- Category: articles
- Document Tags: #geospatial
- URL: https://digitalfellows.commons.gc.cuny.edu/2021/05/12/top-mapping-mistakes/
Highlights
- Using a multicolored categorization for quantitative data. Multicolored categorizations are best for qualitative data, such as a map that shows different types of vegetation. Additionally, the color choices shouldn’t be random but reflective of the category—woodland (brown), forest (green), tundra (white or light brown), etc. (View Highlight)
- Using choropleth maps to display absolute numbers. Choropleth maps represent values by shading patterns. They should be used to display normalized data, which is data that has been compared to the whole population (e.g. rates, percentages, proportions, per capita values, medians, or averages). (View Highlight)
-
- Choosing the wrong color ramp. A bi-chromatic color ramp (the one with two distinct colors) should be used with data that goes above and below a midpoint (e.g. above and below sea level). A single-color ramp should be used to display a continuum without a midpoint (e.g. percent of population fully vaccinated). (View Highlight)
-
- Not documenting the process. This can lead to a lot of confusion down the road if you have to recreate your map. Be sure to record where you got the data, how you cleaned the data, and your general workflow, including the spatial operations you performed and the settings. (View Highlight)
-
- Not taking the time to familiarize yourself with your data. Once a student told me that there was something wrong with QGIS because they had over 5000 rows in their map layer, but only a few hundred points showed up on the map. (View Highlight)
-
- Not cleaning your data. Taking the extra steps to clean your data before you import it into your mapping software will help tremendously in the long run. (View Highlight)
-
- Assuming that the default parameters for a tool are the most appropriate. While the GIS software is programmed to read and respond to certain aspects of your data, remember that you are the one who knows the data the best. If you let your mapping software make decisions for you, it could easily result in a map with compounded inaccuracies. (View Highlight)
-
- Not being aware of the geographic coordinate system and the projected coordinate system. A geographic coordinate system (GCS) tells the software where to draw the data and a projected coordinate system (PCS) tells it how to draw the data by flattening a 3D world onto a 2D surface. GCSs and PCSs work together and there are many to choose from. Each GCS is designed to fit a different part of the world and each PCS is designed to reduce different types of distortions. If you haphazardly combine them, your data may draw in the wrong place and your map may include unforeseen distortions. It’s important to decide which GCSs and PCSs you want to use and transform the map layers so they will be drawn without unknown errors. Check out this helpful article for more on how to select the right geographic transformation. (View Highlight)
-
- Not knowing how the data was collected. Data collection is a tricky process that is also fraught with subjective decisions and human errors. Suppose you are not collecting the data yourself. In that case, it’s even more important to take the time to learn about your data sources and how they went about collecting, aggregating, manipulating and cleaning the data. (View Highlight)
-
- Not selecting an appropriate resolution for the data. Does the data’s resolution make sense in answering your research question? A lot of people use Census data, such as Census tracts, but sometimes these aren’t the most appropriate boundaries. For example, suppose you are studying a spatial phenomenon at the community level, do the boundaries that you’ve selected have meaning to those communities? Does it represent their sense of place? (View Highlight)
-
- Not fully understanding how the spatial operation works. Let’s say you need to aggregate data from the Census tract level to the Census Designated Place (CDP) level. You’d need to perform a Spatial Join by Location that compares the boundaries of both layers and aggregates the census tracts in each CDP. But how would the mapping software decide which Census tracts to include and which to leave out? If only a tiny bit of one Census tract is inside a CDP boundary should the whole tract be counted? Learning about how the Spatial Join by Location operation works is necessary to answer these critical questions. (View Highlight)
-
- Not keeping your data organized. Although shapefiles are still the most widely used vector data format, handling them can be a pain. Each shapefile is not just one file but a set of files that must be kept together for the data visualization to work. Additionally, as you perform spatial joins and other operations, your number of files can quickly grow. To save yourself future headaches, be sure to establish a proper file management structure. (View Highlight)
-
- Giving your variables names that are too long, contain spaces, or have special characters. For shapefiles, there is a 10-character limit for attribute names. This can be a problem if you join a CSV that has attribute names longer than 10 characters. The names will be truncated, which could make them unreadable. Be sure to keep your attribute names short and without spaces or special characters since those can cause additional issues. Additionally, if using QGIS, note that the automatic join setting adds a prefix to each of the joined fields. The default prefix is the joint layer’s name, so if you have a long name, it will entirely cut off your field names, making them impossible to decipher. (View Highlight)
- Not getting the data into the correct format before importing it into your mapping software. Be careful if you edit a CSV dataset with Excel before importing it in your GIS software as Excel might automatically change long numbers into scientific notation. Once you save the CSV, those long numbers will be truncated. This is particularly problematic for US Census numeric ID numbers. Excel is also known for misformatting date fields. Be sure to check all of the attribute field types before importing the text file into your mapping software. (View Highlight)
-
- Not setting up the “environmental settings” when working with raster data. I admittedly only work with vector data, but a member of the GIS Working Group provided this recommendation for raster users. They warned that not setting the mask, cell size, and snapping to option can cause significant problems. They recommended checking the environmental setting at the general and tool level before running any tool. (View Highlight)