get data from nc file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

get data from nc file

Antonio Silva
Dear list members

I downloaded some nc files with precipitation data from
https://pmm.nasa.gov/data-access/downloads/trmm (Level 3 3B43:
Multisatellite Precipitation). For the image link see the global attribute
"history" (below).

With ncdf4::nc_open I cloud open the file (nc.data <- nc_open("
3B43.20080101.7A.HDF.nc")

I want to extract the "StartGranuleDateTime" but it is inside the global
attribute FileHeader (see below).

With ncatt_get(nc.data,0,"FileHeader")$value I got
[1]
"AlgorithmID=3B43;\nAlgorithmVersion=3B43_7.0;\nFileName=3B43.20080101.7A.HDF;\nGenerationDateTime=2012-11-29T19:12:01.000Z;\nStartGranuleDateTime=2008-01-01T00:00:00.000Z;\nStopGranuleDateTime=2008-01-31T23:59:59.999Z;\nGranuleNumber=;\nNumberOfSwaths=0;\nNumberOfGrids=1;\nGranuleStart=;\nTimeInterval=MONTH;\nProcessingSystem=PPS;\nProductVersion=7A;\nMissingData=;\n"

Is there any way to extract only the string "2008-01-01T00:00:00.000Z"?

The best I could do was
as.Date(substr(strsplit(ncatt_get(nc.data,0,"FileHeader")$value,";\n")[[1]][5],22,45),"%Y-%m-%dT%H:%M:%OSZ")

but probably, I suppose, there must be a more direct way of getting the
data. I appreciate any suggestions.

Best regards,

Antonio Olinto
Fisheries Institute
Brazil

nc.data
File 3B43.20080101.7A.HDF.nc (NC_FORMAT_CLASSIC):

     1 variables (excluding dimension variables):
        float precipitation[nlat,nlon]
            units: mm/hr
            coordinates: nlon nlat
            _FillValue: -9999.900390625

     2 dimensions:
        nlon  Size:33
            long_name: longitude
            standard_name: longitude
            units: degrees_east
        nlat  Size:41
            long_name: latitude
            standard_name: latitude
            units: degrees_north

    5 global attributes:
        Grid.GridHeader: BinMethod=ARITHMETIC_MEAN;
Registration=CENTER;
LatitudeResolution=0.25;
LongitudeResolution=0.25;
NorthBoundingCoordinate=50;
SouthBoundingCoordinate=-50;
EastBoundingCoordinate=180;
WestBoundingCoordinate=-180;
Origin=SOUTHWEST;

        FileHeader: AlgorithmID=3B43;
AlgorithmVersion=3B43_7.0;
FileName=3B43.20080101.7A.HDF;
GenerationDateTime=2012-11-29T19:12:01.000Z;
StartGranuleDateTime=2008-01-01T00:00:00.000Z;
StopGranuleDateTime=2008-01-31T23:59:59.999Z;
GranuleNumber=;
NumberOfSwaths=0;
NumberOfGrids=1;
GranuleStart=;
TimeInterval=MONTH;
ProcessingSystem=PPS;
ProductVersion=7A;
MissingData=;

        FileInfo: DataFormatVersion=m;
TKCodeBuildVersion=1;
MetadataVersion=m;
FormatPackage=HDF Version 4.2 Release 4, January 25, 2009;
BlueprintFilename=TRMM.V7.3B43.blueprint.xml;
BlueprintVersion=BV_13;
TKIOVersion=1.6;
MetadataStyle=PVL;
EndianType=LITTLE_ENDIAN;

        GridHeader: BinMethod=ARITHMETIC_MEAN;
Registration=CENTER;
LatitudeResolution=0.25;
LongitudeResolution=0.25;
NorthBoundingCoordinate=50;
SouthBoundingCoordinate=-50;
EastBoundingCoordinate=180;
WestBoundingCoordinate=-180;
Origin=SOUTHWEST;

        history: 2018-12-26 17:57:56 GMT Hyrax-1.13.4
https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/TRMM_L3/TRMM_3B43.7/2008/3B43.20080101.7A.HDF.nc?precipitation[604:636][3:43],nlat[3:43],nlon[604:636]

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: get data from nc file

Ben Tupper
Hi,

Yikes.  I don't think there is any other way as the attributes are sort of buried in the string; that's unfortunate.  I guess you could at least make a reusable function assuming you'll be doing this again or looking to pull other attributes.  Something like this...


#' Extract one of the GLobal Attributes of a TRMM NetCDF as a named vector
#'
#' @param nc the ncdf4 object
#' @param name the name of the global attribute
#' @param sep the separator used to delimit fields in the attribute
#' @return named character vector of attributes
nc_att_split <- function(nc, name = "FileHeader", sep = ";\n"){
       
        a1 <- ncdf4::ncatt_get(nc, 0)[[name[1]]]
        if (is.null(a1)) return(a1)
       
        a2 <- strsplit(a1,";\n", fixed = TRUE)[[1]]
        aa <- strsplit(a2, "=", fixed = TRUE)
       
        x <- sapply(aa,
                function(s) x = if(length(s) <= 1) "" else s[2]
                )
        names(x) <- sapply(aa,
                function(s) x = if(length(s) <= 1) "unknown" else s[1]
                )
       
        x
}


nc <- ncdf4::nc_open("3B43.20080101.7A.HDF.nc")
x <- nc_att_split(nc)
as.Date(x[['StartGranuleDateTime']], format = "%Y-%m-%dT%H:%M:%OSZ")
[1] "2008-01-01"


Cheers,
Ben

> On Dec 26, 2018, at 3:42 PM, Antonio Silva <[hidden email]> wrote:
>
> Dear list members
>
> I downloaded some nc files with precipitation data from
> https://pmm.nasa.gov/data-access/downloads/trmm (Level 3 3B43:
> Multisatellite Precipitation). For the image link see the global attribute
> "history" (below).
>
> With ncdf4::nc_open I cloud open the file (nc.data <- nc_open("
> 3B43.20080101.7A.HDF.nc")
>
> I want to extract the "StartGranuleDateTime" but it is inside the global
> attribute FileHeader (see below).
>
> With ncatt_get(nc.data,0,"FileHeader")$value I got
> [1]
> "AlgorithmID=3B43;\nAlgorithmVersion=3B43_7.0;\nFileName=3B43.20080101.7A.HDF;\nGenerationDateTime=2012-11-29T19:12:01.000Z;\nStartGranuleDateTime=2008-01-01T00:00:00.000Z;\nStopGranuleDateTime=2008-01-31T23:59:59.999Z;\nGranuleNumber=;\nNumberOfSwaths=0;\nNumberOfGrids=1;\nGranuleStart=;\nTimeInterval=MONTH;\nProcessingSystem=PPS;\nProductVersion=7A;\nMissingData=;\n"
>
> Is there any way to extract only the string "2008-01-01T00:00:00.000Z"?
>
> The best I could do was
> as.Date(substr(strsplit(ncatt_get(nc.data,0,"FileHeader")$value,";\n")[[1]][5],22,45),"%Y-%m-%dT%H:%M:%OSZ")
>
> but probably, I suppose, there must be a more direct way of getting the
> data. I appreciate any suggestions.
>
> Best regards,
>
> Antonio Olinto
> Fisheries Institute
> Brazil
>
> nc.data
> File 3B43.20080101.7A.HDF.nc (NC_FORMAT_CLASSIC):
>
>     1 variables (excluding dimension variables):
>        float precipitation[nlat,nlon]
>            units: mm/hr
>            coordinates: nlon nlat
>            _FillValue: -9999.900390625
>
>     2 dimensions:
>        nlon  Size:33
>            long_name: longitude
>            standard_name: longitude
>            units: degrees_east
>        nlat  Size:41
>            long_name: latitude
>            standard_name: latitude
>            units: degrees_north
>
>    5 global attributes:
>        Grid.GridHeader: BinMethod=ARITHMETIC_MEAN;
> Registration=CENTER;
> LatitudeResolution=0.25;
> LongitudeResolution=0.25;
> NorthBoundingCoordinate=50;
> SouthBoundingCoordinate=-50;
> EastBoundingCoordinate=180;
> WestBoundingCoordinate=-180;
> Origin=SOUTHWEST;
>
>        FileHeader: AlgorithmID=3B43;
> AlgorithmVersion=3B43_7.0;
> FileName=3B43.20080101.7A.HDF;
> GenerationDateTime=2012-11-29T19:12:01.000Z;
> StartGranuleDateTime=2008-01-01T00:00:00.000Z;
> StopGranuleDateTime=2008-01-31T23:59:59.999Z;
> GranuleNumber=;
> NumberOfSwaths=0;
> NumberOfGrids=1;
> GranuleStart=;
> TimeInterval=MONTH;
> ProcessingSystem=PPS;
> ProductVersion=7A;
> MissingData=;
>
>        FileInfo: DataFormatVersion=m;
> TKCodeBuildVersion=1;
> MetadataVersion=m;
> FormatPackage=HDF Version 4.2 Release 4, January 25, 2009;
> BlueprintFilename=TRMM.V7.3B43.blueprint.xml;
> BlueprintVersion=BV_13;
> TKIOVersion=1.6;
> MetadataStyle=PVL;
> EndianType=LITTLE_ENDIAN;
>
>        GridHeader: BinMethod=ARITHMETIC_MEAN;
> Registration=CENTER;
> LatitudeResolution=0.25;
> LongitudeResolution=0.25;
> NorthBoundingCoordinate=50;
> SouthBoundingCoordinate=-50;
> EastBoundingCoordinate=180;
> WestBoundingCoordinate=-180;
> Origin=SOUTHWEST;
>
>        history: 2018-12-26 17:57:56 GMT Hyrax-1.13.4
> https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/TRMM_L3/TRMM_3B43.7/2008/3B43.20080101.7A.HDF.nc?precipitation[604:636][3:43],nlat[3:43],nlon[604:636]
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: get data from nc file

Antonio Silva
Thanks for you nice idea and code Ben. I will use them for sure.

Have a wonderful 2019. All the best

Antonio Olinto
Fisheries Institute
Brazil


Em qua, 26 de dez de 2018 às 22:01, Ben Tupper <[hidden email]>
escreveu:

> Hi,
>
> .  I don't think there is any other way as the attributes are sort of
> buried in the string; that's unfortunate.  I guess you could at least make
> a reusable function assuming you'll be doing this again or looking to pull
> other attributes.  Something like this...
>
>
> #' Extract one of the GLobal Attributes of a TRMM NetCDF as a named vector
> #'
> #' @param nc the ncdf4 object
> #' @param name the name of the global attribute
> #' @param sep the separator used to delimit fields in the attribute
> #' @return named character vector of attributes
> nc_att_split <- function(nc, name = "FileHeader", sep = ";\n"){
>
>         a1 <- ncdf4::ncatt_get(nc, 0)[[name[1]]]
>         if (is.null(a1)) return(a1)
>
>         a2 <- strsplit(a1,";\n", fixed = TRUE)[[1]]
>         aa <- strsplit(a2, "=", fixed = TRUE)
>
>         x <- sapply(aa,
>                 function(s) x = if(length(s) <= 1) "" else s[2]
>                 )
>         names(x) <- sapply(aa,
>                 function(s) x = if(length(s) <= 1) "unknown" else s[1]
>                 )
>
>         x
> }
>
>
> nc <- ncdf4::nc_open("3B43.20080101.7A.HDF.nc")
> x <- nc_att_split(nc)
> as.Date(x[['StartGranuleDateTime']], format = "%Y-%m-%dT%H:%M:%OSZ")
> [1] "2008-01-01"
>
>
> Cheers,
> Ben
>
> > On Dec 26, 2018, at 3:42 PM, Antonio Silva <[hidden email]>
> wrote:
> >
> > Dear list members
> >
> > I downloaded some nc files with precipitation data from
> > https://pmm.nasa.gov/data-access/downloads/trmm (Level 3 3B43:
> > Multisatellite Precipitation). For the image link see the global
> attribute
> > "history" (below).
> >
> > With ncdf4::nc_open I cloud open the file (nc.data <- nc_open("
> > 3B43.20080101.7A.HDF.nc")
> >
> > I want to extract the "StartGranuleDateTime" but it is inside the global
> > attribute FileHeader (see below).
> >
> > With ncatt_get(nc.data,0,"FileHeader")$value I got
> > [1]
> >
> "AlgorithmID=3B43;\nAlgorithmVersion=3B43_7.0;\nFileName=3B43.20080101.7A.HDF;\nGenerationDateTime=2012-11-29T19:12:01.000Z;\nStartGranuleDateTime=2008-01-01T00:00:00.000Z;\nStopGranuleDateTime=2008-01-31T23:59:59.999Z;\nGranuleNumber=;\nNumberOfSwaths=0;\nNumberOfGrids=1;\nGranuleStart=;\nTimeInterval=MONTH;\nProcessingSystem=PPS;\nProductVersion=7A;\nMissingData=;\n"
> >
> > Is there any way to extract only the string "2008-01-01T00:00:00.000Z"?
> >
> > The best I could do was
> >
> as.Date(substr(strsplit(ncatt_get(nc.data,0,"FileHeader")$value,";\n")[[1]][5],22,45),"%Y-%m-%dT%H:%M:%OSZ")
> >
> > but probably, I suppose, there must be a more direct way of getting the
> > data. I appreciate any suggestions.
> >
> > Best regards,
> >
> > Antonio Olinto
> > Fisheries Institute
> > Brazil
> >
> > nc.data
> > File 3B43.20080101.7A.HDF.nc (NC_FORMAT_CLASSIC):
> >
> >     1 variables (excluding dimension variables):
> >        float precipitation[nlat,nlon]
> >            units: mm/hr
> >            coordinates: nlon nlat
> >            _FillValue: -9999.900390625
> >
> >     2 dimensions:
> >        nlon  Size:33
> >            long_name: longitude
> >            standard_name: longitude
> >            units: degrees_east
> >        nlat  Size:41
> >            long_name: latitude
> >            standard_name: latitude
> >            units: degrees_north
> >
> >    5 global attributes:
> >        Grid.GridHeader: BinMethod=ARITHMETIC_MEAN;
> > Registration=CENTER;
> > LatitudeResolution=0.25;
> > LongitudeResolution=0.25;
> > NorthBoundingCoordinate=50;
> > SouthBoundingCoordinate=-50;
> > EastBoundingCoordinate=180;
> > WestBoundingCoordinate=-180;
> > Origin=SOUTHWEST;
> >
> >        FileHeader: AlgorithmID=3B43;
> > AlgorithmVersion=3B43_7.0;
> > FileName=3B43.20080101.7A.HDF;
> > GenerationDateTime=2012-11-29T19:12:01.000Z;
> > StartGranuleDateTime=2008-01-01T00:00:00.000Z;
> > StopGranuleDateTime=2008-01-31T23:59:59.999Z;
> > GranuleNumber=;
> > NumberOfSwaths=0;
> > NumberOfGrids=1;
> > GranuleStart=;
> > TimeInterval=MONTH;
> > ProcessingSystem=PPS;
> > ProductVersion=7A;
> > MissingData=;
> >
> >        FileInfo: DataFormatVersion=m;
> > TKCodeBuildVersion=1;
> > MetadataVersion=m;
> > FormatPackage=HDF Version 4.2 Release 4, January 25, 2009;
> > BlueprintFilename=TRMM.V7.3B43.blueprint.xml;
> > BlueprintVersion=BV_13;
> > TKIOVersion=1.6;
> > MetadataStyle=PVL;
> > EndianType=LITTLE_ENDIAN;
> >
> >        GridHeader: BinMethod=ARITHMETIC_MEAN;
> > Registration=CENTER;
> > LatitudeResolution=0.25;
> > LongitudeResolution=0.25;
> > NorthBoundingCoordinate=50;
> > SouthBoundingCoordinate=-50;
> > EastBoundingCoordinate=180;
> > WestBoundingCoordinate=-180;
> > Origin=SOUTHWEST;
> >
> >        history: 2018-12-26 17:57:56 GMT Hyrax-1.13.4
> >
> https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/TRMM_L3/TRMM_3B43.7/2008/3B43.20080101.7A.HDF.nc?precipitation[604:636][3:43],nlat[3:43],nlon[604:636]
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org
>
> Ecological Forecasting: https://eco.bigelow.org

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo