2 views (last 30 days)
Show older comments
Alexander on 14 Aug 2023
-
-
Link
Direct link to this question
https://matlabcentral.mathworks.com/matlabcentral/answers/2008527-error-encountered-when-using-datastores-on-non-rectangular-data-sets
Commented: dpb on 15 Aug 2023
Open in MATLAB Online
Hello, I have been trying to use the following code to create a datastore for some data that has 4 rows of non-data headers at the top of the file, however I keep getting an error.
function bigDataTest(fileLoc)
% This function takes in a folder and extracts all the data into MATLAB
files = dir(fileLoc)
location = fullfile({files(3:end).folder}, {files(3:end).name});
ds = datastore(location, 'type', 'spreadsheet', 'FileExtensions', '.XLSB', 'NumHeaderLines', 4)
end
When I try to run this, I get an error that says
"Error using datastore. Unable to determine range. Range must be of the form 'A1' (cell), 'A:B' (column-select), '1:5' (row-select), 'A1:B5' (rectangle-select), or a valid named range in the sheet."
I'm assuming that this is because datastore needs a rectangular dataset to run, but the set (with the four headers removed) is rectangular. Any help is appreciated!
Thanks.
0 Comments Show -2 older commentsHide -2 older comments
Show -2 older commentsHide -2 older comments
Sign in to comment.
Sign in to answer this question.
Answers (1)
dpb on 14 Aug 2023
Open in MATLAB Online
function bigDataTest(fileLoc)
% This function takes in a folder and extracts all the data into MATLAB
ds=spreadsheetDatastore(fileLoc,"FileExtensions",".xlsb","NumHeaderLines",4);
...
end
As is, the function is useless; it doesn't return anything to the caller; ergo, it can't affect anything.
You have to either return the datastore object and use it elsewhere to actually access the data or as indicated by the ellipses I left, go ahead and use it inside the function as indicated by the comment.
Then, of course, it either has to be fully contained and do everything including outputting whatever results are expected or return that data to the caller or it's still pointless...
3 Comments Show 1 older commentHide 1 older comment
Show 1 older commentHide 1 older comment
Alexander on 15 Aug 2023
Direct link to this comment
https://matlabcentral.mathworks.com/matlabcentral/answers/2008527-error-encountered-when-using-datastores-on-non-rectangular-data-sets#comment_2849812
Edited: Alexander on 15 Aug 2023
I appreciate your concern about the function's usability (it is useless at the moment), but I changed the code to reflect your response and I am still getting the error mentioned above. I am still working on understanding how datastores work in context and this is just my trying to figure it out,
The question was mostly about how to avoid the 4 rows of header text in the data being analyzed that make the data non-rectangular. Is it better to pre-process the data to remove those 4 rows, or is there a way to get around doing that with the numHeaderLines property in the datastore?
dpb on 15 Aug 2023
Direct link to this comment
https://matlabcentral.mathworks.com/matlabcentral/answers/2008527-error-encountered-when-using-datastores-on-non-rectangular-data-sets#comment_2850102
Open in MATLAB Online
The above seems to work just fine here, I tried it for a local folder containing a collection of spreadsheets as follows
>> fnF='C:\SCCC Foundation\Financial';
>> fn=fullfile(fnF,'2019');
>> fs=spreadsheetDatastore(fullfile(fn,'DistF*.xlsx'),"NumHeaderLines",4)
fs =
SpreadsheetDatastore with properties:
Files: {
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_01_Jan.xlsx';
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_02_Feb.xlsx';
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_03_Mar.xlsx'
... and 9 more
}
Folders: {
'C:\SCCC Foundation\Financial\2019'
}
AlternateFileSystemRoots: {}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 4
VariableNamingRule: 'modify'
ReadVariableNames: true
VariableNames: {'Var1', 'Var2', 'Var3' ... and 17 more}
VariableTypes: {'double', 'double', 'double' ... and 17 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 17 more}
SelectedVariableTypes: {'double', 'double', 'double' ... and 17 more}
ReadSize: 'file'
OutputType: 'table'
RowTimes: []
Write-specific Properties:
SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"]
DefaultOutputFormat: "xlsx"
>>
For a specific set of files and then with the whole folder
>> fs=spreadsheetDatastore(fullfile(fn),"FileExtensions",'.xlsx',"NumHeaderLines",4)
fs =
SpreadsheetDatastore with properties:
Files: {
'C:\SCCC Foundation\Financial\2019\2019 Budget.xlsx';
'C:\SCCC Foundation\Financial\2019\2019 CompGainLoss.xlsx';
' ...\Financial\2019\2019 IncStmt Bal Sheet_2019_12_Dec.xlsx'
... and 46 more
}
Folders: {
'C:\SCCC Foundation\Financial\2019'
}
AlternateFileSystemRoots: {}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 4
VariableNamingRule: 'modify'
ReadVariableNames: true
VariableNames: {'Income', 'Var2', 'Var3' ... and 37 more}
VariableTypes: {'char', 'double', 'double' ... and 37 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Income', 'Var2', 'Var3' ... and 37 more}
SelectedVariableTypes: {'char', 'double', 'double' ... and 37 more}
ReadSize: 'file'
OutputType: 'table'
RowTimes: []
Write-specific Properties:
SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"]
DefaultOutputFormat: "xlsx"
>>
It doesn't appear the the datastore object really cares about the file content itself; it just constructs the datastore object as told; it'll then be up to the functions that use the object to be consistent with its definitions in reading the file and then is when something that isn't the same would show up.
So, don't think it's the above code that is the problem by itself.
Unless there's something that is version specific and an earlier release doesn't support the 'NumHeaderLines' option for a spreadsheet file. I'm using R2021b(R3) here...
dpb on 15 Aug 2023
Direct link to this comment
https://matlabcentral.mathworks.com/matlabcentral/answers/2008527-error-encountered-when-using-datastores-on-non-rectangular-data-sets#comment_2850142
The error message is indicative of one from readtable and friends; I've seen that or very similar if attempting to use 'NumHeaderLines' on spreadsheet files; the doc says it is only available for text files, not spreadsheets and .xlsb is a spreadsheet format.
I was surprised to see that the spreadsheet datastore accepted it with the spreadsheet extension.
Show us the whole error message in context with the code that generated it; the datastore creation itself won't give me that error although if I try to set it arbitrarily for a set of files it discovers the files don't all then fit a given number of header lines and it then can't find variable names or other parsing problems.
It could/would help to see a typical file you're trying to process and how you're then trying to read it, not just this one line in isolation.
Sign in to comment.
Sign in to answer this question.
See Also
Categories
MATLABData Import and AnalysisData Import and ExportStandard File FormatsSpreadsheets
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- Deutsch
- English
- Français
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)
Contact your local office