dlbass93 VIP
Total posts: 44
06 Авг 2013 21:42

I have been working on a project ( http://928registry.928oc.org ), and Cobalt 8 has been phenomenal. We are attempting to rapidly grow the project by importing a massive amount of data. I have been unable to figure out how the import process works with Cobalt 8, so I am reaching out for some assistance. If anyone is willing and able to help out, I would really appreciate talking to you. What we want to do may not even be feasible, below is a summary.

Import all Porsche 928s based on VIN number. We will gain access to a large amount of vehicle specific data from Porsche such as paint, option codes, transmission, and so on. We would like one record to be created per car, and for it to update any existing cars (if possible).

That is about it. If you are interested, please let me know.

Thanks guys,

Derrick

Последние изменения: 02 Март 2014


Sergey
Total posts: 13,748
07 Авг 2013 10:02

Did you try to import anything at all? I mean do you familiar with basic import flow? Your question is "Where to start?" or "Who will help?"


Sackgesicht VIP
Total posts: 1,636
07 Авг 2013 10:07

Derrick,

it might be useful to analyze and prepare the data towards a Cobalt structure first.

Import as of now still has some limitations, which you can overcome with some tricks.

Based on the above information it is quite hard to give valuable advice. Better wait until you get the data and then lets look deeper into it.

BTW, it seems your listing is gaining traction .. already more than 400 records ...


dlbass93 VIP
Total posts: 44
07 Авг 2013 11:46

Sergey, I think I have the import setup properly, but once I select my csv, the next button is never clickable, so I must have done something wrong. I attempted to update to the latest release yesterday, and the upload kept timing out, and I couldn't get the alternate installation methods to work either. Probably an issue with my server.

Sackgesicht, We do have the data, I will try to get a sample ready and share it here so you guys can see it.

Thanks for the help guys!


londoh VIP
Total posts: 137
07 Авг 2013 12:26

Hi

I've also recently been looking at importing 'large' (10k - 50k) amounts of records (I dunno how many 928's were made but I would guess that might be a 'massive' number!)

Import as of now still has some limitations, which you can overcome with some tricks

@ Sackgesicht - Can I ask what you see as the limitations, and what tricks have you used to overcome them?

Personally I found the biggest limitation is that you simply cant deal with that amount of data over a post request - browser hangs and/or server times out. The obvious thing is splitting it up into manageable chunks but thats also a painful prospect. And with a lot of categories as I have its still not easy to deal with even a reasonable chunk of my data.

So for my part I set out to write a command line script so I can automate imports. Other stuff has the priority this week so its not finished yet, but I'm pretty sure it'll be possible.


dlbass93 VIP
Total posts: 44
07 Авг 2013 19:44

Here is a sample of the data we have. We have this for every 928 made, I'm guessing over 30,000.

What do you think we can do with this?


Sackgesicht VIP
Total posts: 1,636
07 Авг 2013 22:34

Sergey, I think I have the import setup properly, but once I select my csv

Derrick,

i had a similar experience with one of my test files. I just made a 3 liner manually and it did not parse it properly. After importing it into a CSV tool and just saved it, it worked. I just did not investigate it further at the time. maybe CR/LF issues.

Sergey, I think I have the import setup properly, but once I select my csv

and the upload kept timing out

That is what ETH000 described before. When we upgraded from Mighty Resources to Cobalt we imported around 80K records with 230K files (actually the import was based on folder and filename). Some Cobalt functions were used in the process. The import took also quite some time and we had to chop the original data into pieces since Cobalt generated so many queries in the process and the index structure was not suited for this process. Since we imitated a form submission we had to play around with the indices to achieve a reasonable performance.

Sergey, I think I have the import setup properly, but once I select my csv

and the upload kept timing out

Can I ask what you see as the limitations, and what tricks have you used to overcome them?

Not all field types are supported

The import of child/parent for example,where the source has their own relation key fields.

Now you can just select 1 parent for all import records. With a different strategy to load them into a text field and change later the field parameter with an update script. Also querying the parent field based on another parent (text) proxy.

Sergey, I think I have the import setup properly, but once I select my csv

and the upload kept timing out

Can I ask what you see as the limitations, and what tricks have you used to overcome them?

Here is a sample of the data we have. We have this for every 928 made, I'm guessing over 30,000.

I think we have to think first about the final TYPE structure -- if using 1 TYPE or using several TYPES, then decoding the field info like the OptionsList field-- obviously it contains several groups of information divided by a separator (|)...

It might be a good example/exercise where we can all learn from each other, how to use Cobalt in a real world scenario.


londoh VIP
Total posts: 137
08 Авг 2013 03:02

@Sackgesicht

yes you raise some good points there which I hadnt thought about.

I guess for 2 main reasons:

This is my first project with cobalt and 1 Type seems easily enough!

Thankfully my data is fairly simple and the requirements modest

@Derrick

Sackgesicht talks about the OptionsList field being split by a separator and mentions a '|' (pipe char) But I see groups of digits separated by 'M' Which is it supposed to be?

I dont have windows or excel, or maybe something is lost in the forum up/download.

Apart from that field the data looks pretty straightforward.

From past experience I know excel will do weird things with csv. I suggest to get a copy of libreoffice and open and saveas csv with that. libreoffice will give you options for the csv format.

And as Sackgesicht say it may be the line endings - I used to use textpad on windows and from memory you can opt to save with unix line endings rather than windows cr/lf. That might help.

Also try opening your csv with a decent text editor (textpad again or similar) to see what you're actually trying to import.

there are some free progs around that will split text files into chunks for you.


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 05:10

But I see groups of digits separated by 'M' Which is it supposed to be?

Please see here for the Option Codes.

That's the reason why i was referring to it as option groups. The pipe separates them and the Mxxx will list the options within a group. Therefore my suggestion to decode them first ... Another option could be also a special field output template later to do the work, but then it might get difficult to edit/add records manually without creating a special field for it.


dlbass93 VIP
Total posts: 44
08 Авг 2013 11:50

The data above we just got yesterday. What I had been trying to import was a CSV, but it was not nearly as complete as the sample above. I can create a CSV of the new data, no problem, I just figured the xls would be easier to manipulate because of the row header sorts that are built in. The data was given to me in xls format, but I can convert it to whatever we need. If Windows is truly creating problems, I've got Mac and LinuxMint boxes if it will be better to create the CSV from one of those. All of these machines have LibreOffice on them (they have a gorgeous splash page on their site btw), and this Windows box has Office 2013 (I like some of the easy to use Excel stuff).

The upload timing out problem was occuring when trying to update Cobalt, not when trying to import records. I will keep working on getting updated to the newest version.

I am currently using one section, one type, and twenty-four fields. Six fields are currently set as required, but it would not be a problem for me if I needed to set those as optional for the import to work. As far as I know, the option codes field is the only one that is going to have multiple items in it. I have made that a multiple checkbox field in Cobalt. We have a small team of people who are willing and able to manipulate the data in any way that we need to make it work with Cobalt import. So, if we can figure out what the best way to organize the option codes will be, we will get it done. I know basically nothing about how Cobalt looks at the CSV and how it needs the data to look in order to import it.

I also have access to a more complete list of available option codes if that would be useful. We have come across cars that had option codes that may have only existed on that single car, and we have been adding those codes to our list as well. It is an extensive (yet still incomplete) list.

Another concern in the sample data I gave above is the underscore in the VIN. There should be a checksum digit there. We have a calculator to figure out that digit, and we also have the master VIN list for every 928 from 1981 on, which includes the checksum digit. If that underscore is going to be an issue in the import, then I can probably have our team start correcting it with the real full VIN. HERE is one of the master VIN lists.

Thanks for the input guys, we REALLY appreciate it! I can probably get access to any data that would help make this work out better, and we are not opposed to making changes to the Cobalt configuration if needed, especially if we can save that data that has already been entered. We are about one-quarter of the way done with importing the old site to the new Cobalt powered site, and it would stink if we lost all of that work :P


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 15:04

Derrick,

since you have the source list in excel format, stick with it. Just modify it towards your Cobalt structure.

So, if we can figure out what the best way to organize the option codes will be, we will get it done.

It will depend on your Cobalt fields. Maybe you can explain your target fields to better help you there.

Most probably you have to break the groups (delimiter =| ) in your source excel first. You can do this with the "Split Data into Multiple Columns Wizard". See here

So, if we can figure out what the best way to organize the option codes will be, we will get it done.

Another concern in the sample data I gave above is the underscore in the VIN.

If you want to import the VIN with the generated number instead of the underscore, i would suggest to use a formula in a new column in excel to compute it and store it there. No need for manual encoding.

You might have to split the VIN with the underscore into 2 strings, compute the digit and put it together -- or do the computation and replace the underscore with the digit. to get you started, see here

If you need help with it, share the checksum calculation.

For the options list i would suggest to break the groups up first. You can do this with the "Split Data into Multiple Columns Wizard" in excel. See here


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 15:16

I also have access to a more complete list of available option codes if that would be useful.

You might consider adding a TYPE with the option codes in Cobalt. Title is the option code, and an additional text field for the description .. If option codes are organized by groups, create categories for it.

Create a parent field in your car TYPE and link it to a child field in your options TYPE.

You will get a Cobalt powered option list on your side as well as a nice side effect of this setup.


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 15:49

Correction ... i just looked into the excel file ... maybe the option list is not devided into sections ... it is not the pipe character it is a capital i -- so most probably some of the option codes start with an capital m and some with an capital i, or the i should be replaced with an m ..

If this is the case, forget about "Split Data..". Depending on the way you want to use the data (target field), you will still have to convert the column into a different format. As you stated above, you want to use a checkbox field for it, we have to check how cobalt imports into a checkbox.

How many different option codes exist?

Based on this listing, they dont use the prefix letter


dlbass93 VIP
Total posts: 44
08 Авг 2013 20:12

Most of them do not have a prefix. As far as I can tell, none start with I or M. Some start with C, X, or Z. Our list is still growing, but it currently at 223. I have attached it. I think the I and M are used as delimiters, but who really knows?!

I really am not stuck on using any particular format for any of my fields, whatever would be best is fine by me. I just did what I thought was best at the time. We had a user mention having Cobalt intelligently only offering options (beyond option codes, including transmission and colors and so on) based on the VIN. But, that is way beyond my scope, so I kind of left it at simply being a request. I say that because we are open to doing whatever would be the best way to do it :)

I don't completely understand how I would have multiple Types in a single registration form, but maybe that is because I haven't tried to do it yet. If you suggestion of creating a type just for option codes still stands, I will get to work on that!


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 21:42

Like what JW stated, the letters are for sure not a delimiter. Delimiter should be a unique character.

Based on some other list on the internet, they use a 3 letter code for the option codes.

Like here, here, here (list with 271 option codes), here, here and here -

other pages use 4 letter codes (actually the same 3 letter just with a prefix letter) like here, here

A good overview over almost all Porsche option codes could be found here

Based on the plate picture, I saw on one of the pages, it seems Porsche is using only the 3 letter version and the first character of the 4 character version might be some other part identifier. If it is confirmed that the plate always show only the 3 letter version, i would suggest to go this way, since your users might be confused when they compare it against their original plate.

Btw, your attached list should be in a different format, separating the code and the description. There are also some small "problems", like 034, 595, 418, 139, 018 is there 2 times, 157, 462 with different description. I presume that Porsche used the same option code on different product families with different description, which you can see in the overall option code list mention above.


dlbass93 VIP
Total posts: 44
08 Авг 2013 22:18

Will the import be able to create new option codes? If not, I will work on getting a comprehensive list compiled into two columns. Once that is done, what is the best way to go about creating the option codes with a two column layout? Would that be a whole new type with a single field for each option code?


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 22:20

Based on the number of option codes ( >230) i would not suggest to use a check box field.

Options would be a multilevelselect field (with only 1 level activ) or better a complete new TYPE as described above.

I don't completely understand how I would have multiple Types in a single registration form, but maybe that is because I haven't tried to do it yet.

Create a SECTION "Option Codes" and a TYPE "Option Codes". The TYPE will have at least 2 fields (Description = Text, and car = child). Add a parent field "Option Codes" to your car TYPE.

With this setup you have another nice option to display on your future website --> Option Code listing powered by Cobalt (sorting, search etc)

Actually you will not have multiple types on the form. The Relate fields will take care of this.

Once you will be able to understand these fields, a whole new dimension of Cobalt will open up for you. :D

Now with the relate field, one of the "problems" will be the existing import. It does not really support an easy way of importing such a structure. But this is precisely what i suggested from the beginning for the Cobalt import. (@ETH000 - one of the said limitations)


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 22:35

If not, I will work on getting a comprehensive list compiled into two columns.

I converted your list into 2 columns and marked some "problem" rows for a start.

See attached list


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 22:55

We had a user mention having Cobalt intelligently only offering options (beyond option codes, including transmission and colors and so on) based on the VIN

That will not work, since the VIN does not include all option codes. See here

"Just by reading the VIN, specific option codes cannot be found for a car. However, you can get this information based on the VIN if you contact Porsche."

Maybe it would be worth to contact Porsche, Zuffenhausen with a request for a complete VIN list. Or is your list the official list?


Sackgesicht VIP
Total posts: 1,636
08 Авг 2013 23:43

I just opened one of your records ...

If you remove in the VIN field settings the "Restriction Message", the whole field will not be displayed for unauthorized users. IMHO better not to show it instead of a restriction message. If registered users can see it, then change it to a message like "Register to see VIN".

Current mileage shows the mileage at the point of encoding. But after a year it might be completely different. A date indicator for the mileage would be nice .. :D


Sackgesicht VIP
Total posts: 1,636
09 Авг 2013 00:11

The VIN list we have should be legit, as should the option codes associated with each.

JW, as Derrick stated above, you want to replace the underscore in the VIN with the control/checksum letter. Do you have the computation algorithm?

I found a formula, but not sure if it is the correct one... See here

Работает на Cobalt