Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleIs there an online calculator to size my data set memory requirements?

 

You can find a basic memory requirements calculator at:

https://secure.mccombs.utexas.edu/public/datasetmemorysizer/default.aspx

 

Expand
titleHow many bytes do I need to store results of my mathematical calculations?

 

Keep in mind that you do not just need to determine the byte size of columns in your initial data set. That just determines how much memory you need to load the initial data set. If you will use that data in calculations, you will need to calculate the data types and memory requirements of the resulting data set.

For example, consider a very simple data set with just two one row and two columns (A and B), each with a 2-byte number. If you want to add a third column (C) that involves a calculation of the first two columns, the data type requirements of column C will depend on the type of calculation you are performing.

 

For example, the if the value in colum C below is the result of a caluclation involving the values in columns A and B, the exact value of C depends on what kind of caluclation that is (addition, subtraction, multiplication, etc.). Also, the number of bytes required by C depends on how large that result might be.

A (2-bytes)B (2-bytes)C (?-bytes)
500900?
  • If C is defined as A + B then C = 1,450, therefore column C could also use a 2-byte data type to store this value, since that would be big enough to store this number.
  • If C is defined as C x B then C = 475,400, therefore column C would need to use something larger than a 2-byte data type to store this value, since 2-bytes is too small to store this number.
  • If C is defined as C / B then C = 0.5263..., therefore column C would need to use a 4-byte or 8-byte data type depending on how precise you want your decimal number to be.

 

In the real world, your data sets and calculation won't be this simple, but you would still apply the same principal: when creating a new column based on values of existing columns, you need to determine what the the largest possible result might be.

When you have a choice in selecting data types, choose one with a byte size large enough for what you need, but no larger than that. If you don't have a choice, because your application enforces its own default rules for byte sizes (such as the case with R), then just be aware of what those byte sizes are and plan accordingly.

 

...