-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature to output input values to a log file #32
Comments
Cells created by |
Isn't it possible to just store the content of the cell regardless how it had been created? |
How about this? from modelx import *
import pandas as pd
m, s = new_model(), new_space('s')
s.y = pd.Series([1, 2], pd.Index([3, 4], name='a'))
@defcells
def x(a):
return y[a]
s.x[3] = 10
write_model(m, 'm')
m2 = read_model('m')
print(m.s.x[3], m2.s.x[3]) |
Well, I'm loading all the data from pandas. I can probably make my own "from_pandas" function and in this way trick the saving engine not to apply default "from_pandas" encoding approach. In general I'm struggling to benefit from the ability to compare 2 versions of the saved model. Very frequently I'm using dataframe stored in a cell without arguments. I can't create it using "from_pandas" function, so I'm using "new_cells" function directly and assign value to it. In this case the encoder doesn't provide unique name to the data file and I can't compare 2 models. The ideal situation, I can imagine, is to have modelx structure (argument-value relationship) stored in a json file and non-plain values stored as a reference instead of value. For pandas objects the reference can be control sum driven (MD5). |
All Reference values and Cells' input values are stored in For example, import pandas as pd
import modelx as mx
df = pd.DataFrame()
space = mx.new_space()
@mx.defcells
def foo(): pass
@mx.defcells
def bar(): pass
foo[()]= df
bar[()]= [df]
space.baz = {1:[df]} Then I am thinking of introducing a new type of Reference that represents data read from files. space.new_data(name="y", value=df, path="C:/data.file", encoder="pickle") Then you can do like:
This way, |
Why is this useful: |
So that changes in |
I don't think it is that useful to have such a change tracking mechanism explicitly implemented around data object. I'm pretty happy with storing df in a cell ( I think you have extremely strong and flexible framework around cells, which is to a large extent inspired by Excel framework. I would spend time on fully leveraging it rather than introducing new constructs. |
Another argument here is the need to maintain consistency between data items and overwrites of the formula driven cells. |
If no object identity consistency, then the |
This is about appropriate usage of high level programming language. It will of course be a problem if multiple copies of the object are created with minor modifications for no reason. It will be also a problem in Excel if vector of 10000 cells is copied in a formula 1000 times adding 1 element each time. Have you experienced practical issue due to the copies in memory? |
Btw. |
Let's say modelx Cells are always pass-by-reference, and cannot trace changes of mutable objects' contents. |
Why can't in your example df be stored as a cell value and the other cells referencing it? |
You mean this code? import pandas as pd
import modelx as mx
df = pd.DataFrame()
space = mx.new_space()
@mx.defcells
def foo(): pass
@mx.defcells
def bar(): pass
foo[()]= df
bar[()]= [df]
space.baz = {1:[df]} How can |
Won't this work:
|
You can do it manually, but I cannot think of any way to make
|
Sorry I wasn't precise. I meant this
|
Whether to assign My revised idea is to add import pandas as pd
import modelx as mx
df = pd.DataFrame()
space = mx.new_space()
@mx.defcells
def foo(): pass
@mx.defcells
def bar(): pass
foo[()]= df
bar[()]= [df]
space.baz = {1:[df]}
foo.file = "foodata" Then the input value of I need to think about how to specify |
It looks complex and introduces inconsistency. It also doesn't allow to track changes in specific data entries. My actual use case is as follows:
It isn't essential for me if
Where This approach allows consistent treatment of the overwrites of the formula cells:
|
If The current implementation stores the object ids of keys and values in
And
The objects have different ids when the model is read back. |
I assume you are talking about the case, when there is actually something different between 2 dfs - not just that they are stored separately in memory, but MD5 of I can think of a few options to implement it with stable keys and overcome the issue with duplicate hash:
Btw. why don't you store objects in the pickled dictionary using args tuple as a key? |
I took me long enought to figure out why you want to store data in a separate file for a variable. I guess you do it to allow running the model with a different file.
The limitations of the above solution are
I think both aren't an issue in case of The limitation 1 can be resolved by desigining CSV above to be consistent with |
How about adding an option to enable to output a single log file that contains MD5 of all input values:
Isn't this enough for your purpose? I want to make it disabled by default as it slows down writing. |
This is a good alternative :-) What do you think of |
Will you fix the overwrites saving issue? |
I want to fix it but it may take a while to get to it. Can you not use |
It works, thank you |
One more thing as we are talking about model saving. Have you thought about zip archiving the folder? It will save space and make it more convenient to send the model around. We save the model on OneDrive cloud and it ends up syncing a hundered files instead of 1. |
The text was updated successfully, but these errors were encountered: